home *** CD-ROM | disk | FTP | other *** search
Text File | 1991-09-29 | 219.1 KB | 3,883 lines |
- Comparison of the Network Time Protocol and Digital Time Service
-
- Editor's Note
-
- This document includes transcripts of an exchange of messages between
- Dave Mills of UDel, Dennis Ferguson of UToronto, Joe Comuzzi of DEC and
- Mike Soha of DEC. The issue under discussion is a comparison and
- evaluation of the Network Time Protocol (NTP) and the Digital Time
- Service (DTS). It is important to point out that these messages are
- informal, sometimes opinionated and may contain errors of judgement and
- technical detail. Some points of confusion and misstatement in the
- initial exchanges are clarified as the discussion moves on. The messages
- have been lightly edited to remove nonrelevant asides and repetitive
- material, as well as to unify format style.
-
- This document is provided for informal, collaborative use in research
- only and should not be quoted or cited in a professional publication.
-
- David L. Mills
- Electrical Engineering Department
- University of Delaware
- 1 September 1990
-
- ------------------------------------------------------------------------
-
- Draft document distributed to the NTP engineering group on 12 February
- 1990:
-
- A Comparison of the Network Time Protocol and Digital Time Service
-
- David L. Mills
- Electrical Engineering Department
- University of Delaware
-
- 1. Introduction
-
- The Digital Time Service (DTS) for the Digital Network Architecture
- (DECnet) is intended to synchronize time in computer networks ranging in
- size from local to wide-area. As such it is intended to provide service
- comparable to the Network Time Protocol (NTP) for the Internet
- architecture. This memorandum compares the architectures, functions and
- design issues of NTP and DTS with respect to correctness, stability,
- accuracy and reliability. It is based on information available in the
- various RFCs [MIL89], [MIL90b], journal articles [MIL90a], [MIL90b] and
- other sources [MIL90c] for NTP and on the DTS functional specification
- [DEC89].
-
- In this memorandum the stability of a clock is how well it can maintain
- a constant frequency, the accuracy is how well its frequency and time
- compare with national standards and the precision is how precisely these
- quantities can be resolved within a particular timekeeping system. The
- offset of two clocks is the time difference between them, while the skew
- is the frequency difference (first derivative of offset with time). Real
- clocks exhibit some variation in skew (second derivative of offset with
- time), which is called drift. The correctness of a timekeeping system is
- the degree to which it indicates valid UTC according to some criteria,
- while its reliability is the fraction of the time it can be kept
- operating and connected in the network.
-
- Local clocks are maintained at designated time servers, which are
- timekeeping systems belonging to a synchronization subnet in which each
- server measures the offsets between its local clock and the clocks of
- other servers in the subnet. In this memorandum to synchronize frequency
- means to adjust the clocks in the subnet to run at the same frequency,
- to synchronize time means to set them to agree at a particular epoch
- with respect to Coordinated Universal Time (UTC), as provided by
- national standards, and to synchronize clocks means to synchronize them
- in both frequency and time. The goal of a distributed timekeeping
- service such as NTP and DTS is to synchronize the clocks in all
- participating servers and clients so that all are correct, indicate the
- same time relative to UTC, and maintain specified measures of stability,
- accuracy and reliability.
-
- By international agreement the primary frequency reference for our
- civilization is the atomic oscillator. The standard second is determined
- as a specified number of atomic cycles, the standard day as 86,400
- standard seconds and the standard (Julian) year as 365.25 standard days.
- In order to maintain the nominal solar year, the Gregorian calendar
- mandates the insertion of leap days, which in the absence of political
- upheaval can be determined far in advance. In order to maintain the
- nominal solar day, leap seconds must be inserted at times which cannot
- be reliably determined in advance. The basis of civil time is the UTC
- clock, which first ticked at 0h on 1 January 1972. Without knowledge of
- prior leap seconds, an event determined on the UTC timescale can appear
- some 15 seconds late on the Julian timescale.
-
- Both the NTP and DTS timescales are based on the UTC timescale and are
- intended to run at atomic frequency with leap seconds inserted at times
- decreed by international agreement; however, they are each calibrated in
- different increments starting from different historic dates. While they
- both are intended to thrive in large, undisciplined network systems,
- they differ considerably in their statistical models, algorithms and
- service metrics. In following sections the similarities and differences
- are discussed at length, along with implications bearing on correctness,
- stability, accuracy and reliability.
-
- 2. Basic Principles and Functionality
-
- Both NTP and DTS are designed for use in proliferated computer networks
- with possibly many embedded local nets interconnected by routers,
- gateways and bridges and involving both broadcast and point-to-point
- transmission media. This section summarizes the architecture and service
- objectives of NTP and DTS in turn.
-
- 2.1. The Network Time Protocol
-
- The NTP synchronization subnet consists of a tree-structured graph with
- nodes representing time servers and edges representing the transmission
- paths between them. The root nodes of the tree are represented by
- designated primary servers which synchronize to a radio broadcast or
- calibrated atomic clock. The remaining nodes are designated secondary
- servers which synchronize to other servers, both primary and secondary.
- The number of subnet hops between a particular server and a primary
- server determines the stratum level of that server. All servers, except
- possibly those at the leaves of the tree, have identical functionality
- and can operate simultaneously as clients of the next lower stratum
- level and servers for the next higher one.
-
- Servers, both primary and secondary, typically run NTP with several
- other servers at the same or lower stratum levels; however, a selection
- algorithm attempts to select the most accurate and reliable server or
- set of servers from which to actually synchronize the local clock. The
- selection algorithm, described in more detail later in this document,
- uses a maximum-likelihood clustering algorithm to determine the best
- from among a number of possible servers. The synchronization subnet
- itself is automatically constructed from among the available paths using
- the distributed Bellman-Ford routing algorithm [BER87], in which the
- distance metric is modified hop count.
-
- NTP operates in various modes in order to improve efficiency on local
- wires with many clients. These support operation in conventional RPC
- client/server modes, as well as symmetric and multicast modes. The
- symmetric modes provide a flexible backup function in which the
- direction of time synchronization between a pair of servers can reverse
- due to loss of reachability or quality of service along one path or
- another in the synchronization subnet. The multicast mode is designed to
- provide time to personal workstations where the full accuracy of the
- other modes is not required.
-
- The NTP specification includes no architected procedures for servers to
- obtain addresses of other servers other than by configuration files and
- public bulletin boards. While servers passively respond to requests from
- other servers, they must be configured in order to actively probe other
- servers. Servers configured as active poll other servers continuously,
- while servers configured as passive poll only when polled by another
- server. There are no provisions in the present protocol to dynamically
- activate some servers should other servers fail.
-
- In response to stated needs for security features, NTP includes an
- optional cryptographic authentication mechanism. NTP also includes an
- optional comprehensive remote monitoring mechanism found necessary for
- the detection and repair of various problems in server and network
- configuration and operation. It is anticipated that, when generic
- features capable of these functions have been developed and deployed in
- the Internet, the NTP authentication and monitoring mechanisms may be
- withdrawn.
-
- 2.2. The Digital Time Service
-
- In DTS a synchronization subnet consists of a structured graph with
- nodes consisting of clerks, servers, couriers and time providers. With
- respect to the NTP nomenclature, a time provider is a primary server, a
- courier is a secondary server intended to import time from one or more
- distant primary servers for local redistribution and a server is
- intended to provide time for possibly many end nodes or clerks. Time
- providers, servers and couriers are evidently generic, in that all
- perform similar functions and have similar or identical internal
- structure. The intent is that time providers can be set from radios,
- telephone calls to NIST [NBS88] or even manually.
-
- As in NTP, DTS clients and servers periodically request the time from
- other servers, although the subnet has only a limited ability to
- reconfigure in the event of failure. The selection algorithm used in
- DTS, which is based on the work presented in Marzullo's dissertation and
- reported in [MAR85], will be discussed in detail later in this document.
-
- On local nets DTS servers multicast to each other in order to construct
- lists of servers available on the local wire. Clerks multicast requests
- for these lists, which are returned in monocast mode similar to ARP in
- the Internet. Couriers consult the network directory system to find
- global time providers. For local-net operation more than one server can
- be configured to operate as a courier, but only one will actually
- operate as a courier at a time. There does not appear to be a multicast
- function in which a personal workstation could obtain time simply by
- listening on the local wire without first obtaining a list of local
- servers.
-
- In the DTS model the directory, authentication and management functions
- are provided by other layers, entities and protocols in the DECnet
- architecture. As evident from other documents in the DECnet
- specification suite, these functions are evidently highly developed and
- integrated in the architecture and presumably provide equivalent
- functionality as the NTP authentication and monitoring mechanisms.
-
- 3. Statistical Models and Data Representation
-
- Perhaps the widest departure between the NTP and DTS philosophies is the
- basic underlying statistical model. NTP is based on maximum-likelihood
- principles and statistical mechanics, where errors are expressed in
- terms of expectations. DTS is based on provable assertions about the
- correctness of a set of mutually suspicious clocks, where errors are
- expressed as a set of computable bounds on maximum time and frequency
- offsets. This section explores these models and how they affect the
- quality of service.
-
- 3.1. Statistical Models
-
- The conventional analytical model for real synchronized clocks [ALL74],
- [MIT80] consists of a set of oscillators connected by transmission
- paths. The oscillators are characterized by a set of random variables
- that describe their intrinsic time and frequency offsets relative to a
- reference timescale. In this analysis the time and frequency of an
- oscillator cannot be known exactly and must be described using random
- variables with assumed or derived probability density functions. It is
- possible to quantify absolute upper and lower limits of accuracy only
- with respect to the these functions. In conventional analysis the
- transmission paths between the clocks are modelled as stochastic delays,
- with assumed distributions, usually of exponential type. These paths are
- used to exchange timing information and adjust the time and frequency
- offsets of each oscillator. The behavior of individual clocks, both in
- accuracy and stability, can then be characterized using standard
- engineering tools, such as the theory of phase-locked loops familiar to
- communications engineers [LIN80], [SMI86].
-
- In the model used by DTS and several others in the literature (see
- MIL90b]) a number of assumptions are made, explicitly or implicitly,
- about the shape of the probability density functions describing the
- inherent accuracy and stability of clocks, as well as the delays on the
- transmission paths connecting them. DTS assumes the reading error
- (offset relative to UTC) of a clock has a computable bound. In addition,
- DTS assumes the frequency error (called drift in the DTS functional
- specification) is bounded by an implementation constant. A correct clock
- never strays outside these bounds, which are computed from the inherent
- characteristics of the clock, the inherited characteristics of its
- selected synchronization source, the measured propagation delay and the
- accumulated error since last updated. If it is assumed that real clocks
- and transmission paths can be modelled reliably in this fashion, then
- the DTS algorithm can maintain a system of correct clocks.
- The philosophy inherent in the NTP algorithms is to consider all
- possible information available in the stochastic model and measured
- statistics to arrive at a probabilistic conclusion. The time delivered
- by an NTP subnet is intended to be the most likely measurement in a
- probabilistic system in which all measurements have been weighted toward
- the most likely outcome. However, there is no guarantee that all clocks
- in the system are valid in the sense of provable correctness. No attempt
- is made to determine correctness other than on a statistical basis.
-
- On the other hand, DTS starts with a set of assumptions on the bounds of
- error and growth of error bounds with time. The clock selection
- algorithm is based on an intersection operation which preserves
- correctness according to these assumptions. As long as the underlying
- probability distributions can be bounded absolutely, DTS will deliver
- provably correct time, if at all. But, real distributions seldom behave
- this way, especially in the Internet, where the distributions can have
- surprisingly long tails. Thus, there will almost always exist a tail in
- the distribution which can be truncated only at the expense of some
- error. In other words, the DTS assumptions must be considered valid only
- in the context of the probability distributions actually observed.
-
- In summary, NTP attempts to deliver the best estimate of time and
- frequency with presumably the lowest estimated error, but can not
- guarantee the correctness of the indication relative to an arbitrary set
- of rigid assumptions. DTS attempts to deliver correct time according to
- stated assumptions, but correctness can be guaranteed only with respect
- to these assumptions and these assumptions can be guaranteed only on a
- probabilistic basis.
-
- 3.2. Maximum Likelihood Estimation
-
- It is possible to obtain various measures of expected error when
- processing timekeeping data and use these measures to establish
- preference in the various estimation algorithms. Called maximum-
- likelihood estimation, these techniques are widely used in signal
- processing and communication systems. For instance, experiments show
- [MIL90b] that, in a list of delay/offset measurements ordered by
- roundtrip delay, the most accurate offsets are usually associated with
- the lower delays. When selecting the best offset sample from a single
- clock or when selecting the best set of clocks in an ensemble, NTP gives
- the lower-delay samples greater weight in the filtering, selection and
- combining procedures.
-
- Now consider the DTS selection algorithm, which is due to Marzullo
- [MAR85]. The following diagram shows two scenarios (1) and (2) involving
- three clocks A, B and C. Each of the dashed lines represents the
- interval of time offsets considered correct for that clock. As suggested
- in [MAR85], the probability of a particular clock reading can be assumed
- independent and uniformly distributed over the entire interval.
-
- A +--------------------------+ A +-+
- B +--------------------------+ B +-+
- C +-+ C +-+
-
- Result +-+ Result +-+
-
- (1) (2)
-
- According to the algorithm, the outcome is determined in both cases by
- the intersection of the three intervals. However, once the intersection
- has been formed, all probabilistic information of antecedent
- distributions is lost. Put another way, the probability of the joint
- event consisting of the intersection of all three intervals is far lower
- in (1) than in (2). In NTP this information has proved highly useful in
- mitigating clock selection; however, the information is lost in DTS.
-
- 3.3. Representation of Timestamp Data
-
- Both NTP and DTS exist to provide timestamps to some specified accuracy
- and precision. NTP represents time as a 64-bit quantity in seconds and
- fractions, with 32 bits as integral seconds and 32 bits as the fraction
- of a second. This provides resolution to about 200 picoseconds and
- rollover ambiguity of about 136 years. The origin of the timescale and
- its correspondence with UTC, atomic time and Julian days is documented
- in [MIL90c]. DTS represents time to a precision of 100 nanoseconds,
- although there appears to be no specified maximum value.
-
- The origin of the present 136-year NTP time cycle is specified as the
- first instant of the tropical year that began this century, which is an
- astronomically verifiable epoch. The origin of the DTS timescale appears
- to be implementation of the papal bull establishing the Gregorian
- calendar in 1582, although this instant is verifiable only by historic
- record. However, UTC did not exist prior to 1972 and the Gregorian
- calendar did not achieve widespread use until the early years of the
- twentieth century. While not specified, presumably DTS reckons the
- years, leap years and Julian days of the conventional calendar as
- described in the NTP specification [MIL89] and further elaborated in
- [MIL90c]. In retrospect, it might have been better if both NTP and DTS
- had adopted Modified Julian Day (MJD) numbering directly and avoided
- tropical centuries and papal bulls altogether.
-
- With respect to applications involving precision time data, such as
- national standards laboratories, resolutions less than the 100
- nanoseconds provided by DTS are required. Present timekeeping systems
- for space science and navigation can maintain time to better than 30
- nanoseconds, while range data over interplanetary distances can be
- determined to less than a nanosecond. While an ordinary application
- running on an ordinary computer could not reasonably be expected to
- expect or render precise timestamps anywhere near the 200-picosecond
- limit of an NTP timestamp, there are many applications where a precision
- timestamp could be rendered by some other means and propagated via a
- computer and network to some other place for processing. One such
- application could well be synchronizing navigation systems like LORAN-C,
- where the timestamps would be obtained directly from station timekeeping
- equipment.
-
- 3.4. Time Zones and Leap Seconds
-
- NTP specifically and intentionally has no provisions anywhere in the
- protocol to specify time zones or zone names. The service is designed to
- deliver UTC seconds and Julian days without respect to geographic
- position, political boundary or local custom. Conversion of NTP
- timestamp data to system format is expected to occur at the presentation
- layer; however, provisions are made to supply leap-second information to
- the presentation layer so that network time in the vicinity of leap
- seconds can be properly coordinated. DTS includes provision for time
- zones and presumably summer/winter adjustments in the form of a
- numerical time offset from UTC and arbitrary character-string label;
- however, it is not obvious how to distribute and activate this
- information in a coordinated manner.
-
- NTP and DTS differ somewhat in the treatment of leap seconds. In DTS the
- normal growth in error bounds in the absence of corrections will
- eventually cause the bounds to include the new timescale and adjust
- gradually as in normal operation. Recognizing that this can take a long
- time, DTS includes special provisions that expand the error bounds at
- such times that leap seconds are expected to occur, which can shorten
- the period for convergence significantly. However, until the correction
- is determined and during the convergence interval the accuracy of the
- local clock with respect to other network clocks may be considerably
- degraded.
-
- The accuracy and stability expectations of NTP preclude this approach.
- In NTP the incidence of leap seconds is assumed available in advance at
- all primary servers and distributed automatically throughout the
- remainder of the synchronization subnet as part of normal protocol
- operations. Thus, every server and client in the subnet is aware at the
- instant the leap second is to take affect, and steps the local clock
- simultaneously with all other servers in the subnet. Thus, the local
- clock accuracy and stability are preserved before, during and after the
- leap insertion.
-
- 3.5. Determining Time Offset and Roundtrip Delay
-
- At first glance it may appear that NTP and DTS have quite different
- models to determine delay, offset and error budgets. Both involve the
- exchange of messages between two servers (or a client and a server).
- Both attempt to measure not only the clock offsets, but the roundtrip
- delay and, in addition, attempt to estimate the error. The diagrams
- below, in which time flows downward, illustrate a typical message
- exchange in each protocol between servers A and B.
-
- A B A B
-
- | | | |
- t1 |--------->| t2 t1 |--------->|--- t4
- | | | | |
- | | | |
- | | | | w
- | | | |
- | | | | |
- t4 |<---------| t3 t8 |<---------|---
- | | | |
-
- NTP DTS
-
- In NTP the roundtrip delay d and clock offset c of server B relative to
- A is
-
- d = (t4-t1) - (t3-t2)
- c = ((t2-t1) + (t3-t4))/2.
-
- This method amounts to a continuously sampled, returnable-time system,
- which is used in some digital telephone networks [LIN80]. Among its
- advantages are that both server A and server B can simultaneously
- calculate the delay and offset knowing only the latest time of arrival
- and the three preceding timestamps, which in NTP are carried with the
- message and can also used for authentication purposes. the order and
- timing of the messages are unimportant and reliable delivery is not
- required.
-
- In DTS server A remembers timestamp t1 (other numbered events shown in
- the DTS functional specification are not shown in the diagram) and
- expects server B to return t4, the time of arrival of the request from
- server A, and w, the time elapsed until the departure of the response to
- server A. In principle, although NTP is symmetric and DTS is not, the
- two schemes are computationally equivalent and either can compute delay
- and offset using similar formulas.
-
- Both NTP and DTS have to do a little dance in order to account for
- timing errors due to the precisions of the local clocks and the
- frequency offsets (usually minor) over the transaction interval itself.
- A purist might argue that the expression given above for delay and
- offset are not strictly accurate unless the probability density
- functions for the path delays are known, properly convolved and
- expectations computed, but this applies to both NTP and DTS. The point
- should be made, however, that correct functioning of DTS requires
- reliable bounds on measured roundtrip delay, as this enters into the
- error budget used to construct intervals over which a clock can be
- considered correct. This is not nearly as important in NTP, since the
- accuracy and stability of the local clock is largely due to the local
- clock model, which is described later in this document.
-
- 4. Processing Algorithms
-
- At the heart of any time synchronization system are the algorithms which
- process the data received from possibly many servers, filter out noise
- in the form of outlyers and select the best from among a population of
- mutually suspicious clocks. Issues of the NTP and DTS data filtering,
- clock selection and combining algorithms are compared and discussed in
- following subsections.
-
- 4.1. Data Filtering
-
- In both the NTP and DTS models a number of offsets are collected from
- each of possibly many servers. In principle, the accuracy and precision
- of measurements made between any pair of servers can be improved by
- selecting or combining a number of sequential samples in various ways.
- In NTP a comprehensive program of analysis and experiment lasting
- several years and using many Internet transmission paths involving local
- nets and wide-area nets resulted in what is called the minimum-filter
- algorithm [MIL90b]. Reduced to essentials, this algorithm selects from
- among the last n samples of delay/offset collected from a single server
- the sample with minimum delay and presents the associated offset as the
- time offset estimate for that server. This is done separately for each
- server on a continuous basis at intervals from about one minute to about
- 17 minutes. As part of this continuing process, an error estimate called
- the sample dispersion is constructed as the sum of weighted differences
- of the resulting estimate offset relative to the other n-1 samples
- considered in the selection.
-
- In DTS the same algorithm used to select a cohort set of servers from a
- population possibly including faulty servers (see next section) is used
- to filter the samples from a single server. This approach was discarded
- early in the NTP design experience for two reasons. First, the
- statistical problem of selecting good samples from a sequence produced
- by a single server has the very considerable advantage that the
- underlying probability distribution can be assumed stationary and
- represented by robust statistics such as produced by nonlinear trimmed-
- mean filters, median filters and the NTP minimum filter. Second, the
- problem of selecting good clocks from bad involves a multivariate
- statistical model characteristic of pattern analysis and classification.
- It has been the NTP experience that algorithms that work well on one of
- these two problems usually do not do well on the other.
-
- 4.2. Clock Selection and Combining
-
- In both NTP and DTS the problem persists that there is often no clear
- distinction between truechimers and falsetickers, so that "correct"
- clocks can be deduced only on a probabilistic basis and then only
- according to arbitrary criteria. It could be argued on the basis of
- experience, for example, that various kinds of faulty behavior are more
- likely than others. For instance, it is probable that a faulty clock
- more likely indicates hot ones, cold zeros or an integral number of
- seconds, minutes, days or years in error, rather than fractional parts
- of these quantities. A clock that comes up with no prior hint of correct
- time has a vanishing probability of coming up anywhere near UTC by
- simple nature of the measurement space. In NTP, for example, this would
- amount to guessing the correct 256-ms window in an interval of 136
- years. Interesting observations on these points, including the use of an
- NTP timestamp as a cryptographic one-time pad, can be found in the
- references.
-
- NTP maintains for each server both the total estimated roundtrip delay
- to the root of the synchronization subnet (synchronizing distance), as
- well as the sum of the total dispersion to the root of the
- synchronization subnet (synchronizing dispersion). These quantities are
- included in the message exchanges and form the basis of the likelihood
- calculations. Since they always increase from the root, they can be used
- to calculate accuracy and reliability estimates, as well as to manage
- the subnet topology to reduce errors and resist destructive timing
- loops.
-
- In NTP the selection algorithm determines one or a number of
- synchronization candidates based on empirical rules and maximum-
- likelihood techniques. A combining algorithm determines the local-clock
- adjustment using a weighted-average procedure in which the weights are
- determined by offset sample dispersion. The algorithm begins by
- constructing a list of candidate clocks sorted first by stratum and then
- by total synchronization dispersion to the root. The list is then pruned
- from the end to a manageable size and to eliminate very noisy and
- probably defective clocks. On the assumption that a valid
- synchronization candidate will always be at the lowest or next from
- lowest stratum, the list is truncated at the first entry where the
- number of different strata on the list exceeds two. This particular
- procedure and choice of parameters have been found to produce reliable
- synchronization candidates over a wide range of system environments
- while minimizing the "pulling" effect of high-stratum, high-dispersion
- servers, especially when a large number of servers are involved.
-
- The next step is designed to detect falsetickers or other conditions
- which might result in gross errors. The pruned and truncated candidate
- list is re-sorted in the order first by stratum and then by total
- synchronizing distance to the root; that is, in order of decreasing
- likelihood. A similar procedure is also used in Marzullo's MM algorithm
- [MAR85]. Next, each entry is inspected in turn and a weighted error
- estimate computed relative to the remaining entries on the list. The
- entry with maximum estimated error is discarded and the process repeats.
- The procedure terminates when the estimated error of each entry
- remaining on the list is less than a quantity depending on the intrinsic
- precisions of the local clocks involved.
-
- The NTP selection algorithm is designed to favor those servers near the
- head (maximum likelihood) of the candidate list, which are at the lowest
- stratum and lowest delay and presumably can provide the most accurate
- time. With proper selection of weighting factors, outlyers are discarded
- from the tail of the list, unless some other entry disagrees
- significantly with respect to the remaining entries, in which case that
- entry is discarded first. The offsets of the surviving servers are
- statistically equivalent, so any of them can be chosen to adjust the
- local clock. Some implementations [MIL90c] combine them using a
- weighted-average algorithm similar to that used by national standards
- laboratories [ALL74], in which the offsets of the servers remaining on
- the list are weighted by sample dispersion to produce a combined
- estimate.
-
- DTS uses a rather different technique, where the goal emphasized is
- validated correctness relative to a set of specified criteria. A compact
- way of expressing this algorithm is the following. Each clock is
- expressed as an estimate C and an inherent or inherited error E, which
- defines an interval [C-E,C+E]. A clock is correct if the interval
- includes UTC and incorrect if not; however, it is not known in advance
- which is the case. Consider M such intervals with j possibly faulty
- servers and arrange the lower endpoints on a list by increasing endpoint
- value. Starting from the beginning of the list, find the first point
- which is contained in at least M-f intervals. This defines the lower
- boundary of the correct interval. If no such point is found, increase f
- by one and try again. A similar procedure is used for the upper limit of
- the correct interval.
-
- The error E used to construct the above intervals is determined both by
- the intrinsic characteristics of the clock oscillator (precision), the
- reading delay between the client request and server response and the
- frequency offset over the interval since the oscillator was last
- adjusted. Once established by the above algorithm, the correct interval
- grows with time, possibly engulfing intervals previously considered
- faulty. The interval between client requests is carefully computed to
- prevent the correct interval from exceeding a configuration parameter.
-
- The fundamental assumption upon which the DTS is founded is Marzullo's
- proof that a set of M clocks synchronized by the above algorithm, where
- no more than j clocks are faulty, delivers an interval including UTC.
- The algorithm is simple, both to express and to implement, and involves
- only one sorting step instead of two as in NTP. However, consider the
- following scenario with M = 3, j = 1 and containing three intervals A, B
- and C:
-
- A +--------------------------+
- B +----+
- C +----+
-
- Result +-----================-----+
-
- Using the algorithm described in the DTS functional specification, both
- the lower and upper endpoints of interval A are in M-j = 2 intervals,
- thus the resulting interval is coincident with A. However, there remains
- the interval marked "=" which contains points not contained in at least
- two other intervals. The DTS document mentions this interesting fact,
- but makes a quite reasonable choice to avoid multiple intervals in favor
- of a single one, even if that does in principle violate the correctness
- assumptions. The purist would say that a choice has to be made, either
- the left intersection or the right one, perhaps mitigated by maximum-
- likelihood principles. This example would seem to violate the
- fundamental basis on which the proof of correctness of Marzullo's
- algorithm is based.
-
- In fact, quite similar algorithms were once used in predecessors of NTP
- [MIL85a], [MIL85b], but discarded because they produced inadequate
- accuracy and stability. One of the problems with algorithms such as this
- is that normal variations in network path delay cause frequent occasions
- when one clock or another pops in or out of one correction interval or
- another, causing the interval to change size and resulting in large
- phase noise of the local clock. While the phenomenon is much reduced in
- the present NTP design, some "clockhopping" does occur and is the
- primary contributor in NTP to clock instability. Another reason these
- algorithms were abandoned was that incidental error estimates, such as
- the size of the correct interval, cannot be used either for likelihood
- estimation or to organize the subnet topology.
-
- 5. Local Clocks
-
- There are fundamental differences between the NTP and DTS local-clock
- models. The DTS (and Unix) model assumes the local clock runs at a rate
- R determined by its integral quartz resonator, which may be manufactured
- to a tolerance no better than 100 ppm, which corresponds to several
- seconds per day. A correction is introduced as an offset, which causes a
- number of fractional seconds to be added or subtracted from the local
- clock. In order to avoid large discontinuities and insure monotonicity,
- the rate at which the clock can be adjusted is fixed by an
- implementation constant e (called tickadjust in the Unix kernel). The
- local clock thus runs at three rates: R, R+e and R-e, so that the
- magnitude of correction determines not the magnitude of the rate, but
- the length of time over which the rate is continued. Once the correction
- has been completed, the clock reverts to rate R and continues
- indefinitely at that rate. From the DTS functional specification it
- appears that the designers expect that corrections be recomputed on the
- order of one every 15 minutes to once per hour.
-
- Early in the evolution of NTP the above model was discarded as the
- result of experience with sometimes broken servers and always noisy
- transmission paths. The factor missing in the DTS design is a capability
- to compensate for frequency errors as well as time errors. The NTP
- local-clock model includes provisions to estimate the frequency error
- and automatically adjust the local clock by introducing additional
- offset corrections on a regular basis. This results in much reduced
- frequency errors in the order of .01 ppm or a few milliseconds per day
- in the absence of external corrections. In principal and depending on
- the inherent stability of the local clock, the interval between
- corrections can be reduced to the order of hours and even days.
-
- However, if some or all of the servers in the synchronization subnet are
- to incorporate frequency management, the clock adjustment dynamics must
- be controlled and held to specified tolerances; otherwise, some servers
- can become unstable and experience wild time and frequency gyrations. In
- control theory the DTS design is described as a type-I feedback loop,
- which is unconditionally stable, while the NTP design is described as a
- type-II loop, which can become unstable under some conditions. This
- requires specification of the adjustment rate, including the value of e
- for Unix-style clocks, as well as a specified mechanism to adjust the
- frequency as required. While this has proved a surmountable problem with
- NTP daemons for Unix, it has been suggested that appropriately specified
- functionality be incorporated directly into the design of the kernel
- timekeeping facility, in which case DTS and possibly other schemes could
- benefit as well.
-
- For the most accurate and precise time and frequency using ordinary
- hardware components it is necessary to fine-tune the adjustment dynamics
- to match expected local clock jitter, wander and drift. The NTP model
- incorporates this functionality using a drift estimate (kurtosis) which
- dynamically adjusts the loop bandwidth. It is arguable whether diligent
- pursuit of the highest quality service always justifies the additional
- complexity, but it is certainly necessary if accuracies in the order of
- a few milliseconds and stabilities in the order of a few milliseconds
- per day are required, especially for the primary servers. Since
- stability of the subnet itself is not critically dependent on this
- feature, it can be considered optional in the specification and
- implementation.
-
- In point of fact, the local clock model described in the NTP
- specification is listed as optional in the same spirit as the model
- described in the DTS functional description. As such, the local clock
- can in principle be considered implementation specific and not part of
- the formal specification. However, as demonstrated above, frequency
- compensation requires the local clock adjustment to be carefully
- specified and implemented. The NTP mechanism has been carefully
- analyzed, simulated, implemented and deployed in the Internet, but DTS
- has not. The unavoidable conclusion is that NTP and DTS implementations
- cannot safely interoperate in subnets of any size, unless the DTS local
- clock adjustment mechanism is suitably modified.
-
- 5.2. Monotonicity
-
- It is an uncontested fact that computer systems can be badly disrupted
- should apparent time appear to warp (jump) backwards, rather than always
- tick forward as our universe requires. Both NTP and DTS take explicit
- precautions to avoid the local clock running backwards or large warps
- when running forwards. However, both NTP and DTS models recognize that
- there are some extreme conditions in which it is better to warp
- backwards or forwards, rather than allow the adjustment procedure to
- continue for an outrageously long time. The local clock is warped if the
- correction exceeds an implementation constant, +-128 milliseconds for
- NTP and ten minutes for DTS. The large difference between the NTP and
- DTS values is attributed to the accuracy models assumed.
-
- For most servers and transmission paths in the Internet a offset spike
- (following filtering, selection and combining operations) over +-128
- milliseconds following filtering, selection and combining operations is
- so rare as to be almost negligible. For the few exceptions operating in
- extreme dispersive conditions, such as statistical multiplexors or
- switched landline/satellite paths, the 128-ms value can be increased by
- a configuration parameter. The problem with selecting larger values is
- that the time taken to affect a spike correction can be rather long,
- during which the clock accuracy specification can be exceeded.
- Obviously, the same considerations apply in DTS.
-
- 5.3. Epoch Determination
-
- The DTS functional specification points out an interesting requirement,
- common to other network management and routing protocols, which require
- circuit breakers between a set of clocks synchronized to each other but
- known to be faulty and another set synchronized to each other but known
- to be correct. The problem is to avoid infection of the set of correct
- clocks by timestamps from the faulty set. DTS provides this circuit
- breaker in the form of an epoch number, which is incremented when a new
- subnet is created. Once the first member of the new subnet has been
- created, others can be transferred from the faulty to the correct subnet
- one at a time so that correctness is preserved.
-
- In NTP the circuit breaker is provided by the authentication mechanism,
- which can operate with any of several encryption keys. When a new subnet
- is created, all that is required is to change the key of a known correct
- server and then change the keys of other servers one at a time.
- Eventually all servers are running with the new key and the subnet
- continues as usual. The same scheme has also been used when testing new
- implementations and on occasion to isolate known falsetickers which
- cannot otherwise be partitioned from the Internet.
-
- 5.4. Dynamic Polling Intervals
-
- In both NTP and DTS servers exchange timekeeping data at regular
- intervals. In NTP the polling intervals are dynamically adjusted from
- about one minute to about 17 minutes, depending on sample dispersion and
- local clock stability. In DTS the intervals are fixed, with lower bounds
- of two minutes (servers) and 15 minutes (clerks) and upper bounds
- depending on error tolerance. Relatively frequent polls are necessary
- both to confirm reachability (for hot standby service), as well as to
- maintain accuracy within specified limits (DTS) and to maintain optimum
- subnet stability (NTP).
-
- At the present stage of protocol refinement, NTP polling intervals can
- be expected to be somewhat less than DTP intervals. The reason for this
- is the emphasis in NTP on the highest attainable accuracy and stability,
- which requires compensation for frequency errors as well as timing
- errors. Stability of the closed-loop system without bandwidth control
- presently requires a maximum polling interval in the order of one minute
- for those transmission paths actually used to maintain synchronization;
- however, the polling interval is increased typically to 17 minutes for
- other paths, which statistically account for more than two-thirds of the
- total number of paths. However, with the introduction of bandwidth
- control in the latest NTP implementations, the polling interval can be
- increased to 17 minutes on all paths, with the expectation of even
- larger increases at the higher stratum levels.
-
- 6. Summary and Conclusions
-
- The service objectives of both NTP and DTS are substantially the same:
- to deliver correct, accurate, stable and reliable time throughout the
- synchronization subnet. However, as demonstrated in this document, these
- objectives are not all simultaneously achievable. For instance, in a
- system of real clocks some may be correct according to an established
- and trusted criterion (truechimers) and some may not (falsetickers). In
- the models used by NTP and DTS the distinction between these two groups
- is made on the basis of different clustering techniques, neither of
- which is statistically infallible. A succinct way of putting it might be
- to say that NTP attempts to deliver the most accurate, stable and
- reliable time according to statistical principles, while DTS attempts to
- deliver validated time according to correctness principles, but possibly
- at the expense of accuracy and stability.
-
- In both the NTP or DTS models the problem is to determine which subset
- of possibly many clocks represents the truechimers and which do not. An
- interesting observation about both NTP and DTS is that neither attempts
- to assess the relative importance of misses (mislabelling a truechimer
- as a falseticker) relative to false alarms (mislabelling a falseticker
- as a truechimer). In signal detection analysis this is established by
- the likelihood ratio, with high ratios favoring misses over false
- alarms. In retrospect, it could be said that NTP assumes a somewhat
- lower likelihood ratio than does DTS.
-
- It might be concluded from the discourse in this document that, if the
- service objective is the highest accuracy and precision, then the
- protocol of choice is NTP; however, if the objective is correctness,
- then the protocol of choice is DTS. However, the discussion in Section
- 4.2 casts some doubt either on this claim, the DTS functional
- specification or this investigator's interpretation of it. It is
- certainly true that DTS is "simple" and NTP is "complex," but these are
- relative terms and the complexity of NTP did not result from accident.
- That even the complexity of NTP is surmountable is demonstrated by the
- fact that over 2000 NTP-synchronized servers chime each other in the
- Internet now.
-
- The most serious departure between NTP and DTS, and the reason that
- subnets incorporating large numbers of either protocol can not
- interoperate safely without further consideration, is the fact that in
- NTP it has been found necessary to implement local clock frequency
- compensation and in DTS it has not. Whether or not the additional rigor
- in specification and implementation can be justified depends on the
- expectation of the time-service customers and their applications.
- Frequency compensation not only provides the capability to survive long
- server outages while keeping good local time, but is instrumental in
- reducing timing noise and maintaining the highest accuracy and
- stability.
-
- The widespread deployment of NTP in the Internet seems to confirm that
- distributed Internet applications can expect that reliable, synchronized
- time can be maintained to within about two orders of magnitude less than
- the overall roundtrip delay to the root of the synchronization subnet.
- For most places in the Internet today that means overall network time
- can be confidently maintained to a few tens of milliseconds [MIL90a].
- While the behavior of large-scale deployment of DTS in internet
- environments is unknown, it is unlikely that it can provide comparable
- performance in its present form. With respect to the future refinement
- of DTS, should this be considered, it is inevitable that the same
- performance obstacles and implementation choices found by NTP will be
- found by DTS as well.
-
- 7. References
-
- [ALL74] Allan, D.W., J.E. Gray and H.E. Machlan. The National Bureau of
- Standards atomic time scale: generation, stability, accuracy and
- accessibility. In: Blair, B.E. (Ed.). Time and Frequency Theory and
- Fundamentals. National Bureau of Standards Monograph 140, U.S.
- Department of Commerce, 1974, 205-231.
-
- [BER87] Bertsekas, D., and R. Gallager. Data Networks. Prentice-Hall,
- Englewood Cliffs, NJ, 1987.
-
- [DIG89] Digital Equipment Corporation. Digital Time Service functional
- specification, version T1.0.5. Digital Equipment Corporation,
- December 1989.
-
- [LIN80] Lindsay, W.C., and A.V. Kantak. Network synchronization of
- random signals. IEEE Trans. Communications COM-28, 8 (August 1980),
- 1260-1266.
-
- [MAR85] Marzullo, K., and S. Owicki. Maintaining the time in a
- distributed system. ACM Operating Systems Review 19, 3 (July 1985),
- 44-54.
-
- [MIL85a] Mills, D.L. Algorithms for synchronizing network clocks. DARPA
- Network Working Group Report RFC-956, M/A-COM Linkabit, September
- 1985.
-
- [MIL85b] Mills, D.L. Experiments in network clock synchronization. DARPA
- Network Working Group Report RFC-957, M/A-COM Linkabit, September
- 1985.
-
- [MIL89] Mills, D.L. Network Time Protocol (Version 2) specification and
- implementation. DARPA Network Working Group Report RFC-1119,
- University of Delaware, September 1989.
-
- [MIL90a] Mills, D.L. On the accuracy and stability of clocks
- synchronized by the Network Time Protocol in the Internet system.
- ACM Computer Communication Review 20, 1 (January 1990), 65-75.
-
- [MIL90b] Mills, D.L. Internet time synchronization: the Network Time
- Protocol. IEEE Trans. Communications (to appear). See also: DARPA
- Network Working Group Report RFC-1129, University of Delawary,
- October 1989.
-
- [MIL90c] Mills, D.L. The NTP Local-Clock Model and Control Algorithms.
- (unpublished), February 1990
-
- [MIT80] Mitra, D. Network synchronization: analysis of a hybrid of
- master-slave and mutual synchronization. IEEE Trans. Communications
- COM-28, 8 (August 1980), 1245-1259.
-
- [NBS88] Automated Computer Time Service (ACTS). NBS Research Material
- 8101, U.S. Department of Commerce, 1988.
-
- [SMI86] Smith, J. Modern Communications Circuits. McGraw-Hill, New York,
- NY, 1986.
-
- ------------------------------------------------------------------------
-
- Date: Fri, 16 Mar 90 09:58:35 PST
- From: comuzzi@took.enet.dec.com
- To: mills@udel.edu
- Subject: RE: DTS and NTP revisited
-
- ... The Digital Time Service (DTS) for the Digital Network
- Architecture (DECnet) is intended to synchronize time in computer
- networks ranging in size from local to wide-area.
-
- You seem to be trying to clothe DTS in a propritary cloth. We now refer
- to DECnet as DECnet/OSI since we've incorporated OSI protocols into the
- protocol stack. It is our intention to pursue DTS in the OSI standards
- forums. As such it is intended to provide service comparable to the
- Network Time Protocol (NTP) for the Internet architecture.
- While both are clearly addressing the same problem space, DTS and NTP
- have VERY different goals. I recently spoke to the president of a time
- provider manufacturer and I liked his jargon, he distinguished between
- the time-of-day market and the frequency market. The time-of-day market
- wants to know what time it is, it is not interested in small errors and
- it doesn't want to pay a lot. The frequency market wants stable
- frequency sources, needs high stability and is willing to pay.
-
- NTP is a solution for the frequency market. DTS is only interested in
- the time-of-day market. The major cost for these solutions is not the
- initial capital investment, but the long term management and operation
- cost. As such DTS has goals of auto-configurability and ease of
- management which are not present in NTP.
-
- ... Local clocks are maintained at designated time servers, which
- are timekeeping systems belonging to a synchronization subnet in
- which each server measures the offsets between its local clock and
- the clocks of other servers in the subnet. In this memorandum to
- synchronize frequency means to adjust the clocks in the subnet to
- run at the same frequency, to synchronize time means to set them to
- agree at a particular epoch with respect to Coordinated Universal
- Time (UTC), as provided by national standards, and to synchronize
- clocks means to synchronize them in both frequency and time. The
- goal of a distributed timekeeping service such as NTP and DTS is to
- synchronize the clocks in all participating servers and clients so
- that all are correct, indicate the same time relative to UTC, and
- maintain specified measures of stability, accuracy and reliability.
-
- As stated above, DTS is addressing the time-of-day market hence high
- frequency stability is an not a goal of DTS.
-
- ... Servers, both primary and secondary, typically run NTP with
- several other servers at the same or lower stratum levels; however,
- a selection algorithm attempts to select the most accurate and
- reliable server or set of servers from which to actually
- synchronize the local clock. The selection algorithm, described in
- more detail later in this document, uses a maximum-likelihood
- clustering algorithm to determine the best from among a number of
- possible servers. The synchronization subnet itself is
- automatically constructed from among the available paths using the
- distributed Bellman-Ford routing algorithm [BER87], in which the
- distance metric is modified hop count.
-
- Note that in DTS loops are not a problem, if a system sends out a time
- an ultimately gets back a derived time, due to the communication delays
- the derived time will always arrive back with a larger inaccuracy. The
- only exception to this is the possiblity of a system with a time
- provider and a lousy clock. Then the derived time's inaccuracy could be
- smaller if the time was parked in a system with a good clock. But in
- this case the network clearly has information that the original system
- has lost.
-
- ... The NTP specification includes no architected procedures for
- servers to obtain addresses of other servers other than by
- configuration files and public bulletin boards.
-
- This is a serious short-coming of NTP and definitely makes it harder to
- manage. It is unclear to me why you haven't fixed this since it would
- not seem that difficult to store server names in a namespace.
- While servers passively respond to requests from other servers, they
- must be configured in order to actively probe other servers. Servers
- configured as active poll other servers continuously, while servers
- configured as passive poll only when polled by another server. There are
- no provisions in the present protocol to dynamically activate some
- servers should other servers fail.
-
- This is harder to fix and interacts with the spanning tree. Here at
- least I can see why you didn't make it easier to manage. These problems
- make NTP a system administrators nightmare, but are consistent with the
- two different sets of goals. Consistent with DTS goals we've accepted
- some "clock hopping" in exchange for ease of management.
-
- ... In DTS a synchronization subnet consists of a structured graph
- with nodes consisting of clerks, servers, couriers and time
- providers. With respect to the NTP nomenclature, a time provider is
- a primary server, a courier is a secondary server intended to
- import time from one or more distant primary servers for local
- redistribution and a server is intended to provide time for
- possibly many end nodes or clerks. Time providers, servers and
- couriers are evidently generic, in that all perform similar
- functions and have similar or identical internal structure.
-
- Not only are they generic, they are dynamic. If a time provider system
- loses its radio signal, it immediately reverts to a server, providing
- graceful degradation in the presence of failures.
-
- ... The intent is that time providers can be set from radios,
- telephone calls to NIST [NBS88] or even manually.
-
- The DTS story is actually even better here, we provide a well defined
- time provider interface. This can be used to implement a time provider
- without requiring modification of the protocol portions of the time
- service. (On Unix systems it uses Unix domain sockets). This greatly
- eases adding a new time provider, and permits time provider vendors to
- supply it with their hardware. Note, NTP could (and probably should) do
- this also. We have already done it.
-
- As in NTP, DTS clients and servers periodically request the time
- from other servers, although the subnet has only a limited ability
- to reconfigure in the event of failure.
-
- I don't understand this statement. Reconfiguration within a LAN is about
- as complete as one could imagine. The random selection of global servers
- is robust against any non-partitioning WAN failures.
-
- On local nets DTS servers multicast to each other in order to
- construct lists of servers available on the local wire. Clerks
- multicast requests for these lists, which are returned in monocast
- mode similar to ARP in the Internet. Couriers consult the network
- directory system to find global time providers. For local-net
- operation more than one server can be configured to operate as a
- courier, but only one will actually operate as a courier at a time.
-
- This is false, I think you're failing to distinguish between couriers
- and backup couriers. There can be more than one courier per LAN, each
- will always synchronize with at least one member of the global set.
- Backup couriers use an election algorithm in the absence of a courier.
- Only one backup courier will be elected to function as a courier.
-
- There does not appear to be a multicast function in which a
- personal workstation could obtain time simply by listening on the
- local wire without first obtaining a list of local servers.
-
- That is correct, it would violate the principle that a message exchange
- has to happen in order to correctly assign an inaccuracy.
-
- ... Both NTP and DTS exist to provide timestamps to some specified
- accuracy and precision. NTP represents time as a 64-bit quantity in
- seconds and fractions, with 32 bits as integral seconds and 32 bits
- as the fraction of a second. This provides resolution to about 200
- picoseconds and rollover ambiguity of about 136 years. The origin
- of the timescale and its correspondence with UTC, atomic time and
- Julian days is documented in [MIL90c]. DTS represents time to a
- precision of 100 nanoseconds, although there appears to be no
- specified maximum value.
-
- The DTS time is a signed 64 bits of 100 nanoseconds since Oct 15, 1582.
- It will not run out until after the year 30,000 AD. Unlike NTP which
- will run out in 2036. I, for one, intend to still be alive in 2036!
- There are two reasons the 100 ns. was chosen:
-
- 1) We want to use these timestamps as a time representation, for
- filesystem timestamps, etc. We REALLY don't want to deal with the
- problem that our representation is inadequate in some reasonably
- future time. Also, since the 64 bits is signed, times back to 28,000
- BC can be represented. This is potentially useful for astronomical
- data, and happily, includes all of recorded history. If we decreased
- the resolution, we would give up range. This choice seemed like a
- reasonable compromise.
-
- 2) Since we include the the transmission delay in the inaccuracy, 100 ns
- represents only 30 meters. Its not meaningful to talk about
- synchronizing clocks below that level with our algorithm. (I believe
- its not meaningful to talk about synchronizing clocks below that
- level with NTP either).
-
- The total timestamp is 128 bits, this includes a four bit version number
- field which would permit these decision to be revisited in the future.
-
- ... With respect to applications involving precision time data,
- such as national standards laboratories, resolutions less than the
- 100 nanoseconds provided by DTS are required. Present timekeeping
- systems for space science and navigation can maintain time to
- better than 30 nanoseconds, while range data over interplanetary
- distances can be determined to less than a nanosecond. While an
- ordinary application running on an ordinary computer could not
- reasonably be expected to expect or render precise timestamps
- anywhere near the 200-picosecond limit of an NTP timestamp, there
- are many applications where a precision timestamp could be rendered
- by some other means and propagated via a computer and network to
- some other place for processing. One such application could well be
- synchronizing navigation systems like LORAN-C, where the timestamps
- would be obtained directly from station timekeeping equipment.
-
- There is an obvious inconsistency in your position here. If you're just
- using the NTP time format for synchronization, then talking about 136
- year rollovers makes some sense. It could be hidden from the users by
- extending the protocol. If, however, as this paragraph implies you
- intend the NTP time format as a general timestamp, then there will be
- extreme pain in the year 2036. (This is refered to in DEC as the
- "date75" problem!) To avoid this without unduly extending the timestamp
- DTS has traded off being able to use its timestamp format for certain
- highly precise applications.
-
- NTP specifically and intentionally has no provisions anywhere in
- the protocol to specify time zones or zone names. The service is
- designed to deliver UTC seconds and Julian days without respect to
- geographic position, political boundary or local custom. Conversion
- of NTP timestamp data to system format is expected to occur at the
- presentation layer; however, provisions are made to supply leap-
- second information to the presentation layer so that network time
- in the vicinity of leap seconds can be properly coordinated. DTS
- includes provision for time zones and presumably summer/winter
- adjustments in the form of a numerical time offset from UTC and
- arbitrary character-string label; however, it is not obvious how to
- distribute and activate this information in a coordinated manner.
-
- The information is used only as a help in user displays. That is, an
- application can display BOTH the UTC time and the local time at which a
- timestamp was created. It only cost 12 bits to do this. No use is made
- of the timezone information by DTS or by systems.
-
- ... The accuracy and stability expectations of NTP preclude this
- approach. In NTP the incidence of leap seconds is assumed available
- in advance at all primary servers and distributed automatically
- throughout the remainder of the synchronization subnet as part of
- normal protocol operations. Thus, every server and client in the
- subnet is aware at the instant the leap second is to take affect,
- and steps the local clock simultaneously with all other servers in
- the subnet. Thus, the local clock accuracy and stability are
- preserved before, during and after the leap insertion.
-
- Each server has to maintain and propagate this state before the leap
- insertion. This is, of course, subject to Byzantine failures. A failing
- server can insert a bad notification.
-
- ... In NTP the roundtrip delay d and clock offset c of server B
- relative to A is
-
- d = (t4-t1) - (t3-t2)
- c = ((t2-t1) + (t3-t4))/2.
-
- This method amounts to a continuously sampled, returnable-time
- system, which is used in some digital telephone networks [LIN80].
-
- The derivation of the expression for 'c' above assumes the two transit
- delays for this exchange are symmetric. If there are systematically
- asymmetric transmission delays then the NTP algorithm will shift the two
- clocks so that they appear to be synchronized, when in fact they are
- systematically off by some number of milliseconds. The NTP minimum
- filter attempts to minimize this effect assuming that the shortest round
- trip exchange would have to be symmetric or nearly so. Unfortunately
- quite large systematic asymmetric delays can occur for a variety of
- reasons: source-routed networks, broken routing tables, etc. and these
- would apply to all transactions including the shortest. This problem
- exists in DTS also, but in DTS both of the systems will have an
- inaccuracy which encompasses the correct time. That is, DTS will not
- claim to have synchronized clocks to a level which it has not, even in
- the presence of asymmetric delays. NTP can and has.
-
- ... Both NTP and DTS have to do a little dance in order to account
- for timing errors due to the precisions of the local clocks and the
- frequency offsets (usually minor) over the transaction interval
- itself. A purist might argue that the expression given above for
- delay and offset are not strictly accurate unless the probability
- density functions for the path delays are known, properly convolved
- and expectations computed, but this applies to both NTP and DTS.
- The point should be made, however, that correct functioning of DTS
- requires reliable bounds on measured roundtrip delay, as this
- enters into the error budget used to construct intervals over which
- a clock can be considered correct.
-
- However, this is not at all hard to compute. Simply increase the
- inaccuracy by the potential drift of the local clock during the
- transaction. The architecture specifies this.
-
- ... NTP maintains for each server both the total estimated
- roundtrip delay to the root of the synchronization subnet
- (synchronizing distance), as well as the sum of the total
- dispersion to the root of the synchronization subnet (synchronizing
- dispersion).
-
- This synchronizing distance has a rather loose definition. I believe the
- current NTP RFC suggests using ten times the mean expected error for the
- synchronizing distance. If this parameter is important to the NTP
- algorithm I would expect some stronger specification. Also, where does
- the value ten come from? I know its experimentally derived and seems to
- work
-
- ... These quantities are included in the message exchanges and form
- the basis of the likelihood calculations. Since they always
- increase from the root, they can be used to calculate accuracy and
- reliability estimates, as well as to manage the subnet topology to
- reduce errors and resist destructive timing loops.
-
- While you state the synchronizing distance and sychronizing dispersion
- can be used to calculate accuracy, I have never seen a derivation of how
- this could be done. This is one of the recurring points, the lack of
- formal proofs.
-
- ... The next step is designed to detect falsetickers or other
- conditions which might result in gross errors. The pruned and
- truncated candidate list is re-sorted in the order first by stratum
- and then by total synchronizing distance to the root; that is, in
- order of decreasing likelihood. A similar procedure is also used in
- Marzullo's MM algorithm [MAR85]. Next, each entry is inspected in
- turn and a weighted error estimate computed relative to the
- remaining entries on the list. The entry with maximum estimated
- error is discarded and the process repeats. The procedure
- terminates when the estimated error of each entry remaining on the
- list is less than a quantity depending on the intrinsic precisions
- of the local clocks involved.
-
- A point which is not discussed here is that when NTP chooses to prune an
- entry, it can not determine if this entry's problem is that it comes
- from a bad clock (falseticker in your jargon), or experienced unusually
- large and asymmetric network delays. The latter case is something to be
- expected in normal operation, the former represents a problem which
- should be fixed. DTS uses the interval information to identify such bad
- clocks, and reports them. Since if a clocks interval doesn't intersect
- the majority it is clearly faulty. This is, of course, a MAJOR issue in
- distributed system management.
-
- ... The fundamental assumption upon which the DTS is founded is
- Marzullo's proof that a set of M clocks synchronized by the above
- algorithm, where no more than j clocks are faulty, delivers an
- interval including UTC. The algorithm is simple, both to express
- and to implement, and involves only one sorting step instead of two
- as in NTP. However, consider the following scenario with M = 3, j =
- 1 and containing three intervals A, B and C:
-
- A +--------------------------+
- B +----+
- C +----+
-
-
- Result +-----================-----+
-
- Using the algorithm described in the DTS functional specification,
- both the lower and upper endpoints of interval A are in M-j = 2
- intervals, thus the resulting interval is coincident with A.
- However, there remains the interval marked "=" which contains
- points not contained in at least two other intervals. The DTS
- document mentions this interesting fact, but makes a quite
- reasonable choice to avoid multiple intervals in favor of a single
- one, even if that does in principle violate the correctness
- assumptions.
-
- Come on, this in no way violates the correctness assumption. The proofs
- tell us that the correct time is somewhere in the two dashed sub-
- intervals. By making the statement that the time is somewhere in the
- larger interval, a server is making a WEAKER assertion. Marzullo's proof
- would apply and the algorithm would work (sub-optimally) if servers
- arbitrarily lengthened the intervals they computed.
-
- ... In point of fact, the local clock model described in the NTP
- specification is listed as optional in the same spirit as the model
- described in the DTS functional description. As such, the local
- clock can in principle be considered implementation specific and
- not part of the formal specification.
-
- This is a rather odd statement. What I read is that the local clock
- model is not explicitly required by the NTP documents, but it is, in
- fact, required in functioning implementations.
-
- However, as demonstrated above, frequency compensation requires the
- local clock adjustment to be carefully specified and implemented.
- The NTP mechanism has been carefully analyzed, simulated,
- implemented and deployed in the Internet, but DTS has not.
-
- I have never read a clear specification of the required quality of the
- input time to NTP. However, the following argument shows that in a LAN
- of typical machines, DTS can indeed provide time to NTP. The clock
- resolution of most machines is between 1 and 16.7 milliseconds. Thus,
- any single measurements made by NTP MUST experience this clock jitter.
- NTP can achieve better overall results only by averaging many such
- measurements. We have measured the 'jitter' of DTS times in LANs, it is
- less than 10 milliseconds, so if DTS supplies time to NTP in a typical
- LAN, the NTP will receive time similar in quality to the time it gets
- from other NTP servers. In the WAN case, the jitter may be a problem, I
- assume that to interoperate in the presence of WAN links may require
- clock training.
-
- If you could provide the derivation of accuracy from synchronization
- distance and synchronization dispersion that you allude to in section
- 4.2, this could form the basis of reliable interoperation with NTP
- supplying time to DTS. Alas, I suspect such a derivation is
- unachievable. However, for installations which are not concerned with
- the DTS guarantee, the time provider interface could be used to import
- NTP time into DTS (just like any time provider, though there would have
- to be a user supplied inaccuracy, based on local experience with NTP).
- We intend to include a sample time provider program to permit this.
-
- ... It is an uncontested fact that computer systems can be badly
- disrupted should apparent time appear to warp (jump) backwards,
- rather than always tick forward as our universe requires. Both NTP
- and DTS take explicit precautions to avoid the local clock running
- backwards or large warps when running forwards. However, both NTP
- and DTS models recognize that there are some extreme conditions in
- which it is better to warp backwards or forwards, rather than allow
- the adjustment procedure to continue for an outrageously long time.
- The local clock is warped if the correction exceeds an
- implementation constant, +-128 milliseconds for NTP and ten minutes
- for DTS. The large difference between the NTP and DTS values is
- attributed to the accuracy models assumed.
-
- I believe the difference also comes from different assumptions of the
- risks (and probabilistic costs) involved in jumping the clock. We assume
- it is something you want to do rarely.
-
- For most servers and transmission paths in the Internet a offset
- spike (following filtering, selection and combining operations)
- over +-128 milliseconds following filtering, selection and
- combining operations is so rare as to be almost negligible.
-
- The duplicated text makes me think there is something wrong here, though
- frankly I don't understand what this paragraph is trying to say.
-
- ... The service objectives of both NTP and DTS are substantially
- the same: to deliver correct, accurate, stable and reliable time
- throughout the synchronization subnet. However, as demonstrated in
- this document, these objectives are not all simultaneously
- achievable. For instance, in a system of real clocks some may be
- correct according to an established and trusted criterion
- (truechimers) and some may not (falsetickers). the models used by
- NTP and DTS the distinction between these two groups is made on the
- basis of different clustering techniques, neither of which is
- statistically infallible. A succinct way of putting it might be to
- say that NTP attempts to deliver the most accurate, stable and
- reliable time according to statistical principles, while DTS
- attempts to deliver validated time according to correctness
- principles, but possibly at the expense of accuracy and stability.
-
- I would claim you're understating DTS's goals of autoconfigurability and
- managablility.
-
- In both the NTP or DTS models the problem is to determine which
- subset of possibly many clocks represents the truechimers and which
- do not. An interesting observation about both NTP and DTS is that
- neither attempts to assess the relative importance of misses
- (mislabelling a truechimer as a falseticker) relative to false
- alarms (mislabelling a falseticker as a truechimer). In signal
- detection analysis this is established by the likelihood ratio,
- with high ratios favoring misses over false alarms. In retrospect,
- it could be said that NTP assumes a somewhat lower likelihood ratio
- than does DTS.
-
- I'm not sure I understand your jargon here. The important trade off for
- DTS is to notify managers of broken clocks (calling a falseticker a
- falseticker) so that it can be fixed. Declaring a good clock bad
- (labeling a truechimer a falseticker) could only occur in DTS as an
- implementation error or as a massive multi-server failure. In either
- case a human will have to get involved.
-
- It might be concluded from the discourse in this document that, if
- the service objective is the highest accuracy and precision, then
- the protocol of choice is NTP; however, if the objective is
- correctness, then the protocol of choice is DTS. However, the
- discussion in Section 4.2 casts some doubt either on this claim,
- the DTS functional specification or this investigator's
- interpretation of it.
-
- I believe you are doing your position a disservice by raising this red-
- herring. No one has found your argument that DTS violates the
- assumptions of Marzullo's thesis convincing. Lamport commented that it
- indicates a serious misunderstanding of Marzullo's proof.
-
- It is certainly true that DTS is "simple" and NTP is "complex," but
- these are relative terms and the complexity of NTP did not result
- from accident. That even the complexity of NTP is surmountable is
- demonstrated by the fact that over 2000 NTP-synchronized servers
- chime each other in the Internet now.
-
- The ever decreasing cost of time providers argues heavily for a simple
- solution, even though it may require more time providers. It simply
- isn't worth a lot of software complexity, (and maintenance cost, and
- management cost) to avoid spending a few dollars to buy more providers.
- Further, the philosophy of 'correctness' leads to certifiable
- implementation by independent vendors.
-
- ... The widespread deployment of NTP in the Internet seems to
- confirm that distributed Internet applications can expect that
- reliable, synchronized time can be maintained to within about two
- orders of magnitude less than the overall roundtrip delay to the
- root of the synchronization subnet. For most places in the Internet
- today that means overall network time can be confidently maintained
- to a few tens of milliseconds [MIL90a]. While the behavior of
- large-scale deployment of DTS in internet environments is unknown,
- it is unlikely that it can provide comparable performance in its
- present form. With respect to the future refinement of DTS, should
- this be considered, it is inevitable that the same performance
- obstacles and implementation choices found by NTP will be found by
- DTS as well.
-
- I disagree with this final paragraph. I think that NTP and DTS both
- attain their very different goals. Our difference of opinion is in how
- important the different goals are. I accept that DTS will not keep
- clocks quite as tightly synchronized as NTP. It will, however, be a
- product that a vendor can confidently ship to customers who are expected
- to install, configure and manage it themselves.
-
- ------------------------------------------------------------------------
-
- Date: Mon, 19 Mar 90 14:19:44 EST
- From: Dennis Ferguson <dennis@gw.ccie.utoronto.ca>
- To: Mills@udel.edu, comuzzi@took.dec.com, elb@mwunix.mitre.org,
- marcus@osf.org
- Subject: Re: Review and comment on comparison of DTS and NTP
-
- I have avoided more than a brief comment on DTS since my copy was
- compressed two pages onto a page and then FAXed to a really rotten FAX
- machine here. It is half unreadable, and I have been a little reluctant
- to get involved in this for fear I might attribute to DTS shortcomings
- which are dealt with in the parts I can't read. I don't, however, feel
- my particular concerns about DTS have been adequately dealt with by what
- has passed so far.
-
- As a consumer of time I care only about results, and what is required of
- me to achieve them. In particular I care about three things: that my
- computer's system clock be as accurate as possible as much of the time
- as possible for a reasonable expenditure of CPU cycles and network
- bandwidth, that I not have to work too hard to achieve this, and that I
- have some way of verifying the truthfulness of the time I am receiving.
- Note that how accurate is "accurate" is hardly quantifiable, the clock
- should provide the time as accurately as is possible within the
- constraints stemming from hardware and network deficiencies. Similarly,
- I would like to do no work to achieve this other than starting a
- program. The "truthfulness" part is important since one of the major
- application groups which will require time synchronization will no doubt
- be authentication protocols (e.g. SNMP authentication, and Kerberos to
- some degree) and I don't want to leave a hole for attacking these
- through the time protocol.
-
- In this light I'd like to make some comments on the recent round of
- debate concerning DTS versus NTP.
-
- NTP is a solution for the frequency market. DTS is only interested
- in the time-of-day market. The major cost for these solutions is
- not the initial capital investment, but the long term management
- and operation cost. As such DTS has goals of auto-configurability
- and ease of management which are not present in NTP.
-
- This is blatantly misleading. NTP is more accurate than DTS because it
- includes computational machinery to condition the local clock, which DTS
- lacks. DTS includes a sub-protocol for autoconfiguring a large portion
- of the synchronization subnet, NTP does not (or at least not on the
- scale of DTS). All this is true.
-
- What is misleading is the strong implication that these issues are
- somehow related. They are completely and utterly orthogonal. NTP could
- be enhanced with an auto-configuration protocol, and indeed could use a
- variant of DTS' scheme, without affecting the precision it achieves one
- wit. Similarly, DTS' omission of the local clock machinery was in no way
- necessitated by its ability to auto-configure, nor any other aspect of
- the protocol which I can fathom. DTS just left this part out, and as a
- consequence is sloppier.
-
- The "time-of-day market" versus "frequency market" analogy is hence
- quite faulty, I think. I can see no cost at all associated with NTP's
- precision, perhaps other than requiring additional work on the part of
- the implementer of the software.
-
- As stated above, DTS is addressing the time-of-day market hence
- high frequency stability is an not a goal of DTS.
-
- Again, this is so silly. As far as I can see, DTS has gained exactly
- nothing by leaving out the local clock conditioning code, but has lost
- an order of magnitude or more in accuracy under normal conditions and
- several orders of magnitude should the subnet partition and cause loss
- of reachability of the radio clock servers. Just saying this "is not a
- goal of DTS" with the "time-of-day market" irrelevancy thrown in does
- not make this reasonable.
-
- This from Dave Mills, followed by Joe Comuzzi, followed by Dave Mills:
-
- There does not appear to be a multicast function in which a
- personal workstation could obtain time simply by listening on
- the local wire without first obtaining a list of local
- servers.
-
- That is correct, it would violate the principle that a message
- exchange has to happen in order to correctly assign an inaccuracy.
-
- There appears to be a considerable Internet constituency which
- has noisily articulated the need for a multicast function when
- the number of clients on the wire climbs to the hundreds.
- Having responded to the articulation noise, I thought it might
- be a reasonable idea to include this capability (so far
- untested) on LANs with casual workstations, promiscuous
- servers and simple protocol stacks.
-
- This capability is certainly not untested. My NTP daemon does
- multicasting, and I synchronize about 80 machines here this way (I also
- note that 8/9ths of the computers which make up the NSFnet backbone are
- also time synchronized with broadcast NTP. I have no idea how many other
- users there are). The clients which are synchronized this way require no
- configuration and, indeed, the scheme scales well since I could
- synchronize 100's of machines this way at no additional expense. The NTP
- clock filter is adaptable for use with one-way time, the selection
- algorithms continue to work as always, and systematic time skews between
- machines with precise clocks are on the order of a few milliseconds (and
- stably so. This is NTP, after all). Truthfully, while I like the DTS
- approach to clock combination, I'm not sure I care about knowing the
- inaccuracy enough to miss knowing it on hosts at the bottom of the
- synchronization tree. Multicast time seems to me to be a feature the
- "time-of-day market" could truly make good use of.
-
- The DTS time is a signed 64 bits of 100 nanoseconds since Oct 15,
- 1582. It will not run out until after the year 30,000 AD. Unlike
- NTP which will run out in 2036. I, for one, intend to still be
- alive in 2036!
-
- Note that this comment is only relevant if one requires that time stamps
- for long lived things like files be identical to timestamps used by the
- time synchronization protocol. I see no reason to require this, about
- the only thing you save are a couple of conversion routines. If this
- isn't a requirement then the only thing NTP has to worry about are
- packets which spend more than 136 years in transit, and that there be
- some external time source which allows you to determine the time-of-day
- to within 68 years before running the protocol. I can't get too excited
- about this.
- 2) Since we include the the transmission delay in the inaccuracy,
- 100 ns represents only 30 meters. Its not meaningful to talk
- about synchronizing clocks below that level with our algorithm.
- (I believe its not meaningful to talk about synchronizing clocks
- below that level with NTP either).
-
- I have some 68000-based hardware with microsecond system clocks which
- have an on-time second output I can look at with an oscilloscope, or
- plot on a chart recorder. The clocks have ovenized crystal oscillators.
- I see synchronization between the machines, via NTP across an ethernet,
- on the order of 50-75 us (after calibrating out systematic asymmetric
- delays in the code). This is hardly trying, either. I don't think 100 ns
- is particularly small. GPS can give me UTC more precisely than that, I
- don't think it unreasonable to expect time synchronization protocols to
- be prepared to move time with precisions I can get today, let alone 20
- years from now.
-
- From Dave Mills, followed by Joe Comuzzi:
-
- Both NTP and DTS have to do a little dance in order to account
- for timing errors due to the precisions of the local clocks
- and the frequency offsets (usually minor) over the transaction
- interval itself.
-
- However, this is not at all hard to compute. Simply increase the
- inaccuracy by the potential drift of the local clock during the
- transaction. The architecture specifies this.
-
- What bugs me about this is that increasing the inaccuracy by the
- potential drift (it appears to me the latter must be configured as well,
- since DTS doesn't seem to include any machinery to determine it on the
- fly. DTS may not be quite as "manageable" as one might believe) may make
- the protocol work nicely, but doesn't do a damn thing for the precision
- of my drifting system clock. The latter is what I'm paying for a time
- synchronization protocol for, the fact that DTS can tell me it is doing
- a bad job by giving me error bounds doesn't excuse the fact that it is
- doing a bad job.
-
- It appears to me that the NTP local clock code is a natural for DTS. It
- avoids having to configure an expected drift to get a realistic
- estimate. It essentially reduces the drift by more than an order of
- magnitude. Since the local clock algorithm is analitically well defined
- it should be quite possible to have it produce a meaningful inaccuracy
- for use by the protocol automatically (the compliance estimate is close
- already). The inclusion of the local clock conditioning code would
- affect little else that I can see. Why didn't DTS include it?
-
- A point which is not discussed here is that when NTP chooses to
- prune an entry, it can not determine if this entry's problem is
- that it comes from a bad clock (falseticker in your jargon), or
- experienced unusually large and asymmetric network delays. The
- latter case is something to be expected in normal operation, the
- former represents a problem which should be fixed. DTS uses the
- interval information to identify such bad clocks, and reports them.
- Since if a clocks interval doesn't intersect the majority it is
- clearly faulty. This is, of course, a MAJOR issue in distributed
- system management.
-
- The possibility of separating broken clocks from broken networks is a
- neat feature of DTS' approach, and one much to be desired. I covet this
- ability. But why, oh why, was local clock conditioning ignored and the
- accuracy of the system clock compromised? I covet the latter more
- intensely, and can't understand why I can't have both.
-
- I have never read a clear specification of the required quality of
- the input time to NTP. However, the following argument shows that
- in a LAN of typical machines, DTS can indeed provide time to NTP.
- The clock resolution of most machines is between 1 and 16.7
- milliseconds. Thus, any single measurements made by NTP MUST
- experience this clock jitter. NTP can achieve better overall
- results only by averaging many such measurements. We have measured
- the 'jitter' of DTS times in LANs, it is less than 10 milliseconds,
- so if DTS supplies time to NTP in a typical LAN, the NTP will
- receive time similar in quality to the time it gets from other NTP
- servers. In the WAN case, the jitter may be a problem, I assume
- that to interoperate in the presence of WAN links may require clock
- training.
-
- It is incorrect that the NTP local clock must experience a jitter of the
- magnitude of the precision of the remote machine's clock, since this
- data is passed through the filter algorithm before it reaches the clock.
-
- What mystifies me is where the 10 milliseconds comes from and how this
- is typical. I am on thin ice here, but let me expose my ignorance by
- making some assertions based on what I can decode from the DTS spec. DTS
- seems to be designed to deal with clock drifts in the 100 ppm ballpark.
- It appears to me that the clock is updated once in 15 minutes (??).
- Without clock conditioning, this would imply that a DTS synchronized
- clock may jitter by 90 ms over the update period, and that on average
- the clock will be 45 ms off (this may be incorrect, but I see nothing at
- all in there which compensates for drift with predictive adjustments).
-
- This is what is alarming, an NTP-corrected clock likely won't drift by
- 90 ms if left without synchronization for 2 or 3 days, and under normal
- operation when synchronized across a LAN will on average be right dead
- on (or, at least, show systematic offsets in the sub-milliseconds which
- are more related to code path lengths and such). There is no reason why
- DTS couldn't match this performance.
-
- Now, either your 10 ms implies that the "typical" clock you tested with
- had an inherent drift of less than 10 ppm, or that I am grossly mistaken
- and the clock update interval is more like a minute-and-a-half (implying
- a lot of traffic?). If the latter is true, I apologize. If the former is
- true, however, I would suggest you may be in for a big shock when you
- try to run your protocol in the "real world". From data taken from about
- 275 machines which run NTP here, we see an *average* drift of 30 ppm
- slow, with quite a large standard deviation. Fully 5% of those machines
- have drift rates greater than the 100 ppm (I note that none of these are
- built by DEC, which may be why DTS has a more optimistic view of the
- world). There are six workstations in the room next door which drift at
- a rate of 300-350 ppm fast. NTP handles all of these (though it was
- helped with the 300 ppm stations by some priming), and the local clock
- effectively reduces the drift rate to a ppm or less in almost all cases.
- DTS leaves this part grossly underengineered.
-
- It is an uncontested fact that computer systems can be badly
- disrupted should apparent time appear to warp (jump)
- backwards, rather than always tick forward as our universe
- requires.
-
- I believe the difference also comes from different assumptions of
- the risks (and probabilistic costs) involved in jumping the clock.
- We assume it is something you want to do rarely.
-
- Both Dave and Joe miss a fundamental point here. There is nothing in the
- NTP spec which requires the system clock (as opposed to NTP's local
- clock) to step into a +-128 ms window, or a +-512 ms window, or into a
- 15 minute window for that matter. NTP need never step the system clock
- backwards. There is a compilation option to my daemon which causes it to
- slew the system clock under all conditions, as this closes a hole when
- used with Kerberos. The performance when done this way is very nearly
- identical to the performance when the system clock is allowed to step.
- NTP's stepping of the system clock is an absolute non-issue, it can do
- it any way you prefer.
-
- A succinct way of putting it might be to say that NTP attempts
- to deliver the most accurate, stable and reliable time
- according to statistical principles, while DTS attempts to
- deliver validated time according to correctness principles,
- but possibly at the expense of accuracy and stability.
-
- I would claim you're understating DTS's goals of
- autoconfigurability and manageability.
-
- I would claim that all of the above is misleading.
-
- (1) There is no reason that I can see why DTS' accuracy and stability
- couldn't be improved to NTP levels without violating correctness
- principles. Indeed, such an enhancement could only improve DTS'
- error bound estimate since it seems to me the local clock could be
- used to automatically produce estimates of things that need to be
- configured now.
-
- (2) NTP's accuracy and stability in no way preclude additions to the
- protocol to ease configuration and management. The latter just
- hasn't been done yet.
-
- (3) DTS' autoconfigurability and manageability have nothing to do with
- its ability to achieve NTP's level of performance, or lack thereof.
- The computational machinery required to do the latter was omitted
- for no good reason that I can see.
-
- The ever decreasing cost of time providers argues heavily for a
- simple solution, even though it may require more time providers. It
- simply isn't worth a lot of software complexity, (and maintenance
- cost, and management cost) to avoid spending a few dollars to buy
- more providers. Further, the philosophy of 'correctness' leads to
- certifiable implementation by independent vendors.
-
- I continue to believe it is not constructive to "certify
- correctness" in probabilistic systems, only to exchange
- acceptable tolerance bounds for acceptable error bounds. If by
- "time providers" you imply each is associated with a radio
- clock, I do not think it likely that the cost of a radio clock
- will plummet to the point that every LAN can afford one and,
- even if it did, you can not trust a single radio. You have to
- have more than one of them and, preferably, no common point of
- failure between them.
-
- I find myself in agreement, and disagreement, with both of these points
- of view. I am personally a believer that every LAN should have a "time
- provider" or three, and that the only thing which prevents this from
- happening is a chicken-and-egg problem (radio clocks are low volume
- items and hence are expensive. Radio clocks are expensive, so not a lot
- of people want to buy them).
-
- Again and again, however, the issue of the maintenance and management
- cost versus performance tradeoff rears its ugly head. *There* *is* *no*
- *such* *tradeoff*, the issues are orthogonal. Moreover, since the local
- clock processing is utterly divorced from the rest of the protocol, and
- since NTP's local clock is far and away the part of the spec most
- solidly supported by analysis, I can see no reason whatsoever that its
- inclusion in DTS would affect one's ability to produce certifiable
- implementations in any way.
-
- ... The widespread deployment of NTP in the Internet seems to
- confirm that distributed Internet applications can expect that
- reliable, synchronized time can be maintained to within about
- two orders of magnitude less than the overall roundtrip delay
- to the root of the synchronization subnet. For most places in
- the Internet today that means overall network time can be
- confidently maintained to a few tens of milliseconds [MIL90a].
- While the behavior of large-scale deployment of DTS in
- internet environments is unknown, it is unlikely that it can
- provide comparable performance in its present form. With
- respect to the future refinement of DTS, should this be
- considered, it is inevitable that the same performance
- obstacles and implementation choices found by NTP will be
- found by DTS as well.
-
- I disagree with this final paragraph. I think that NTP and DTS both
- attain their very different goals. Our difference of opinion is in
- how important the different goals are. I accept that DTS will not
- keep clocks quite as tightly synchronized as NTP. It will, however,
- be a product that a vendor can confidently ship to customers who
- are expected to install, configure and manage it themselves.
-
- Again the implication that a time protocol which is configurable and
- manageable cannot be precise. It is not that DTS cannot be precise, it
- is that it is not precise. I still have yet to see even one clear,
- understandable advantage which has been gained by not making DTS
- precise.
-
- If I had to choose, I'd send both protocols back to the drawing board
- for further revision. NTP badly needs some work done in the area of
- auto-configuring large synchronization subnets, since this can be
- painful. DTS compromises its precision by ignoring relatively simple,
- straight forward, analitically sound techniques for improving the
- behaviour of the local clock, a deficiency which buys it nothing in
- other areas that I can see.
-
- DTS' "correctness" philosophy is truly attractive to me. If we were
- comparing paper protocols I would rate DTS a hands down winner. The
- thing is, "correctness" carries less weight with me in the comparison to
- NTP since NTP is a protocol derived from long practical experience and
- which is known to work well in the real world. Unless I am
- misunderstanding something (a distinct possibility), the "10 ms typical"
- problem may indicate that DTS' idea of what the world is like might not
- be based on wide experience. I like DTS, though. I just wish the
- spurious omission of computational machinery to deal properly with the
- local clock were fixed before international standardhood forces sloppy
- timekeeping (or, at least, a lot sloppier than it has any good reason to
- be) on us all.
-
- The only other concern I had was related to authentication. I just
- wanted to make sure either that DTS was only being targetted for the OSI
- environment or that it was carrying over all the related authentication
- baggage required into the Internet environment. Given the dependence of
- other Internet protocols' authentication schemes on the security (and
- synchronization, even) of the system clock, I think an Internet time
- protocol which lacks authentication will be unusable in a rapidly
- growing number of situations (of course, NTP could really use some help
- in the area of key management, regardless).
-
- ------------------------------------------------------------------------
-
- Date: Wed, 21 Mar 90 14:10:08 PST
- From: comuzzi@took.enet.dec.com
- To: mills@udel.edu
- Subject: I've sent this to everyone else, yours bounced because of a
- typo.
-
- This is a note to continue the DTS/NTP comparison, because I too am
- finding this conversation fruitful. Dennis, I would be glad to mail you
- a copy of the DTS architecture if you want one. (and Dave, I've even
- changed the cover and introduction).
-
- I'd like to address some of the issues Dennis raised. I'll save his
- major point, about accuracy and ease-of-managment being orthogonal, for
- last. Allow me to start with the decision of DTS not to support a
- multicast mode. One reason was that protocols which multicast the time
- will be subject to Byzantine failures. (Clearly anyone can just
- multicast any time they want. This problem does not occur with using
- multicast to locate the servers in the DTS architecure, multiple servers
- would have to be co-opted.) The second point, the one I was trying to
- raise in my reply to Dave, was that it was DTS's intention to have the
- time and interval information available on every node. The hope was that
- this would permit the proliferation of applications and algorithms which
- used the DTS guarantee (that UTC is contained in the interval). Clearly,
- until such a facility is available on every system, few are going to
- spend a lot of time exploring such issues. One such application is in
- DECnet/OSI phase V management. Events are logged with their time and
- inaccuracy. This permits a network admistrator to examine log entries
- and determine that one event could not have caused a second event (if,
- for instance, the interval of the first event doesn't overlap and is
- later than the second.) Obviously this requires the inaccuracy to be
- available on every source of events, that is, every node. The third
- point I'd like to discuss refers to Dave's statement about a
- "considerable Internet constituency which has noisily articulated the
- need for a multicast function when the number of clients on the wire
- climbs to the hundreds." Is it that they wanted multicast? Or was the
- real objection the practical difficulty of adding a second server to a
- LAN of 300 nodes and then having to change the server entry in 150
- ntp.conf files to redistribute the load? (This is a concrete example of
- why I claim NTP is a system admistrator's nightmare. However, my real
- interest in this discussion is to separate the problem
- (autoconfiguration) from the particular solution that was chosen
- (multicast the time), and to understand what the motivation of the
- Internet community was.) Clearly DTS responds to the autoconfiguration
- problem, but if the requirement really was that they didn't want to add
- that additional server at all then a multicast scheme is probably the
- only solution. However, one should be clear about what is being traded-
- off, a multicast solution will manifestly have Byzantine failures. I
- think this difference is an interesting one and would like to have more
- discussions about it.
-
- The next topic I'll discuss is the area of the DTS architecture I
- personally believe is least likely to survive the standardization
- process unchanged: The timestamp format. I think we have much agreement
- here. I defended the 100 nanosecond resolution of the DTS timestamp,
- though I'm not particularly happy with it either. I do think it would be
- a win if the network timestamp format equaled the internal timestamp
- format, which argues that the network time format wants to be long
- lived. Even if this argument is rejected, however, there is still a
- price to be paid when the network timestamp runs out - implementations
- which haven't added the rollover code will be unable to operate. The
- question here is how low-level, pervasive and long-lived network nodes
- might become (I've heard suggestions of putting the OSI protocol stack
- in a thermostat, I suppose it could happen.) It's true that sitting here
- today, it's hard to imagine that any code I write will still be
- executing in 2036, butif I were blasting it into ROM I'd be less sure
- about that. I guess I'm also attracted to Dave's suggestion of using
- Julian Day numbering and fractions within the Julian Day, though when I
- try to plug in the numbers and actually design the timestamp I'm forced
- to conclude the total timestamp would have to be a bit bigger. Since
- I've already heard grumbling about the size of DTS's 128 bit timestamp,
- I guess I'll have to think about this further.
-
- There has been a fair amount of discussion about interoperation of the
- two time services. Let me try to clarify what I said (too tersely) in my
- original response to Dave's paper. There are three separate cases I'd
- like to distinguish
-
- A) An isolated group of DTS systems which obtain time from NTP
- B) An isolated group of NTP systems which obtain time from DTS.
- C) A collection of DTS and NTP giving time to each other.
-
- I believe that without fairly major changes in one or both of the
- architectures case C is a problem, however it is an easily prevented
- problem (see below). Cases A and B however are very interesting, useful
- and I claim easily achievable.
-
- NTP giving time to DTS would make sense in environments where DTS
- systems are being added to LANs where there is no local time provider
- and there is a pre-existing NTP infastructure. The model for how to do
- this would be to use the DTS time provider interface to import the NTP
- time. Comparison of DTS timestamps within the DTS group would work
- normally; there would, however, be some risk of getting an incorrect
- result if timestamps of two such groups were intercompared. This risk
- can be managed though, the interoperation would require the user specify
- an inaccuracy for the NTP time based on the local experience with NTP,
- the obvious choice would be some multiple of the synchronization
- distance - the more risk adverse the larger the multiple. This would (at
- low probability) violate the DTS correctness philosophy. If you want
- certainty, buy two hardware time providers per LAN. Note, however, a
- reasonable compromise would be one hardware time provider and an NTP
- time provider. The DTS fault detection would check the hardware against
- NTP and complain when there was a discrepancy.
-
- Case B corresponds to situations where a DTS infastructure exists (or is
- being added) and there is a need to deliver time to systems using NTP.
- This is the situation I was talking about in my response to Dave. I made
- several assumptions, which I didn't make clear. I assumed that the
- gateway would be both a DTS server and an NTP server, that the NTP code
- would be operating in the "don't change the clock" mode (In the U or
- Maryland implementation this is the -s option, I'm told Dennis's
- implementation has an option in the ntp.conf file to do the same thing.)
- and that the NTP clients were on the same LAN. DTS servers synchronize
- with each other at two minute intervals (Note this is server to server -
- clients synchronize at 15 minute intervals. This is Dennis's question.)
- Now, I'm not claiming that "the NTP local clock must experience a jitter
- of the magnitude of the precision of the remote machine's clock," what I
- am claiming is that in normal NTP operation the input data to the NTP
- filtering algorithm must experience a jitter of the magnitude of the
- precision of the remote machine's clock. This is the same magnitude as
- the jitter of the DTS managed clock. I have conducted experiments with
- this configuration and haven't experienced any wild instabilities.
- Again, NTP timestamps generated in two such groups will not be
- intercomparable with each other to the level that they would if the time
- was being delivered by NTP all the way. Currently however, no
- application can be algorithmicly depending on distributed time derived
- from NTP unless the algorithm contains a parameter which says "times
- closer together then this will be assumed unordered", that is, unless
- the algorithm imputes an inaccuracy to the NTP time.
-
- Case C potentially breaks the invariants of both protocols. The DTS
- invariant is that UTC is contained within the interval. The NTP
- invariant (I'm less sure of my statement here) is that the frequency of
- good servers agree with UTC. NTP has a further invariant, that there are
- no loops in the time distribution network. This is enforced by the
- stratum. Clearly if DTS took time from a collection of NTP servers and
- later gave it back to the same collection of servers, a loop could and
- probably would occur. There is a simple method to prevent this, I
- propose that the gateway described in case B above always declare itself
- to be at some fairly high (numerically large) stratum. Potential clients
- will ignore the DTS/NTP server in favor of servers which obtained their
- time exclusively via NTP (and have much lower stratum numbers). I'm
- assuming that the NTP implementation at the gateway can be coerced into
- using a fixed stratum and would propose a value of 16 for this purpose.
- There's also a stratum zero which is supposed to be used when the
- stratum is unknown, however I'm not sure I understand what value servers
- which obtain their time from a stratum zero server will use. Do they use
- zero? If so, how are loops prevented amongst themselves?
-
- A few nits before we get to the meat of the discussion. Dennis is
- concerned that the drift has to be input as a management parameter. I'll
- show my vendor colors here and say this isn't a management parameter but
- an implementation detail. The assumption is that when DTS is shipped to
- you from your hardware or software vendor, the drift has been
- autoconfigured. That is, a good implementation of the DTS architecture
- would know the maximum drift rate for the machines that the
- implementation has been certified on. Until some sort of architecture
- neutral distribution format (OSF is attempting to settle on such an
- ANDF) is created this is really easy to do - Just hard code in the value
- that's appropriate for the given processor family. If there's wide
- variation within the processor family (or if there are multiple
- processor families due to an ANDF), you'll have to code a table. About
- the only case where the user would have to enter it is where he's
- created his own custom hardware configuration. I guess that doesn't
- bother me.
-
- The second nit has to do with DTS's treatment of leap seconds. I appear
- not to have been clear here. Dave's original document was basically
- correct in its description of how DTS handles leap seconds - Servers
- increase their inaccuracy at the month boundary and a time provider
- narrows the interval later. When I wrote: "Each server has to maintain
- and propagate this state before the leap insertion. This is, of course,
- subject to Byzantine failures. A failing server can insert a bad
- notification." I was describing my understanding of (and a problem with)
- NTP's leap second handling. If my understanding of NTP is incorrect, I
- apologize, but the Byzantine problem seems real to me.
-
- In my reply to Dave I stated "DTS will not claim to have synchronized
- clocks to a level which it has not, even in the presence of asymmetric
- delays. NTP can and has." Dave correctly points out that "NTP does not
- claim to have synchronized to any level, only to minimize the level of
- probablistic uncertainty and estimate the error incurred." I retract the
- last sentence. What I was trying to say is that DTS makes it clear to
- the user he could be losing in this way, while NTP does not make it
- clear.
-
- Dave asked if I am in substantial agreement with the statistical models
- presented in his first document. I agree with most of this section. My
- only significant disagreement is with the last paragraph. It is true
- that DTS assumes that a system's clock drifts at a rate less than its
- manufacture's specification, and that a hardware time provider operates
- within specification. The probablities of these assumption being false
- are on the order of magnitude of other hardware failures. Software
- implementation do not routinely checksum memory any more (and they
- certainly don't do it to find memory errors). Violations of these
- assumptions represent faults, just as real as processor faults, and
- should be fixed. Note the long tails you observe in the distributions in
- the Internet are on message transmission times and the like. These
- parameters are dynamically measured in the DTS algorithm. Wick Nichols
- stated: "DTS is willing to accept historical estimates of the
- probability that a clock will go faulty (with checks for faultiness),
- but is not willing to accept historical estimates of current network
- characteristics." in his discussion of this point for the OSF.
-
- Dennis asked a question about DTS authentication in the Internet
- environment. What I personally would like to see is an implementation of
- DTS using Apollo's NCS which in turn used Kerberos authentication. This
- is basically what Digital has proposed to the OSF in response to their
- distributed computing request for technology.
-
- Now to the major contention of Dennis's review, that accuracy and ease-
- of-management are "completely and utterly orthogonal". I disagree with
- this less then a reader of my response to Dave might think, though I am
- somewhat in disagreement with it. What I hold is that ease-of-
- management, provablility and accuracy for a time service are all
- interrelated. I believe that ease-of-management is something that must
- be engineered into a system from the start, it can't be tacked on as an
- afterthought. Further, I believe simple systems are easier to manage
- then complex systems. The failure modes that Murphy can find in complex
- systems are just so much more (for lack of a better word) complex. Now,
- much of DTS ease-of-management derives from its autoconfiguration, the
- autoconfiguration is, in turn, dependent on the (relative lack of)
- configuration rules, that is, that synchronizing any two servers just
- works and cannot lead to instabilities. The problem with just adding the
- NTP local clock model to DTS (as I understand the NTP local clock model)
- is that the resulting system could have wild instabilities. (Maybe my
- understanding of NTP is incorrect here.) The dynamic nature of the DTS
- autoconfiguration rules (couriers choosing a random member of the global
- set for instance) means the the input time driving the local clock model
- will have what Dave calls "phase noise". As I understand NTP's local
- clock model this is where the instability creeps in. Further, the
- existing NTP protocol avoids loops by using a stratum concept, again the
- DTS autoconfiguration happily produces loops. As I noted previously this
- doesn't effect the DTS algorithm, but they would cause havoc for NTP.
- Again one could add complexity to the DTS algorithm to prevent the
- loops, but I claim one would pay a price in system management cost.
- Another problem (according to Dave) is that the resultant phase locked
- loops have to be analyzed in the light of assumed probability
- distributions, etc. and one does not end up with the sort of proofs of
- correctness that are what is liked in DTS. There is one interesting
- aside on this last point. I believe there is a way one could add clock
- training to the DTS model and preserve the correctness. If the training
- algorithm decides to change the rate of the clock by some amount,
- *increase* the maximum drift rate by that amount. I believe this can be
- shown provably correct by the techniques in Marzullo's thesis. However,
- while this improves the precision of the time (the intersystem phase
- differences and rate differences will be smaller) the inaccuracy (the
- guarantee given to the user) will be worse! That DTS has chosen not to
- do this is, of course, the basic philosophical difference about what's
- important showing up again. However, the existance of at least one
- method to incorporate clock training into a provable system gives hope
- that both camps can be satified and in particular that the large body of
- work on the NTP local clock model can be incorporated. I am not (yet)
- expert enough in the NTP local clock model to see my way through this.
-
- The other obvious possibility is to just add the autoconfiguration to
- NTP. To an extent, this is occuring. The multicast functionallity
- clearly addreses the ease-of-management issue. However, for NTP servers,
- I claim that chosing the right server is important enough that it can't
- be left to an algorithm. Swithing at random between servers reintroduces
- the clock-hopping problem (the the extra phase noise produced by the
- clock-hopping will cause problems for NTP.) One could attempt to just
- pick a set of server at random and stick with them for some long time to
- reduce clock hopping, but that will produce serious sub-optimality in
- the case of a changing network configuration (The particular servers
- being synchronized with might become cut-off from their good time
- sources, or the paths to them might involve links which become
- overloaded, and this wouldn't be discovered for a long time.)
-
- My desire is a solution which maintaines provability and doesn't require
- a large tradeoff between autoconfiguration and accuracy. So far the only
- improvement I can make in the DTS architecture forces me to give up one
- measure of accuracy to improve another. More work needs to be done here.
-
- ------------------------------------------------------------------------
-
- Date: Sat, 24 Mar 90 19:14:56 EST
- From: Dennis Ferguson <dennis@gw.ccie.utoronto.ca>
- To: comuzzi@took.dec.com
- Cc: Mills@udel.edu, elb@mwunix.mitre.org, marcus@osf.org
- Subject: Re? More discussion of the differences (and similarities) of
- DTS and NTP.
- I realize that, while the NTP local clock processing is not a
- particularly difficult coding exercise, it rests on a foundation which
- is the subject of lots of textbooks and quite a number of academic
- journals. I also realize the presentation in Appendix F of the current
- NTP document certainly does not derive this stuff from first principles,
- since that would require turning the NTP spec into another control
- theory textbook. I can understand the derivation in there (though I
- couldn't have produced it, and could verify it only with great
- difficulty) by virtue of a somewhat academic, traditional engineering
- background (Dave seems to have a very academic, traditional engineering
- background), but I realize the stuff may look more than a little opague
- if no one has ever forced you to learn/use that stuff. It is, however,
- very worth while to get a handle on what is in there, at least
- functionally, because this can make orders of magnitude difference in
- the results you get from your time protocol.
-
- Let me a number of assertions, some of which I'm not going to be able to
- support here, but which can be discovered by looking very carefully at
- the local clock description. I think you will find that it is not what
- you think.
-
- (1) The NTP local clock does for NTP (and potentially for DTS) what
- adjtime() does for DTS. It is essentially a procedure which is
- called with a time offset as an argument, and which does something
- to the system clock as a result.
-
- (2) Like adjtime(), the NTP local clock is fully deterministic. There
- are no probability considerations here. When you give it a time
- offset, the effect this has on the system clock is fully
- predictable. Hence the NTP local clock can have no effect on your
- ability to maintain correctness, any more than the behaviour of
- adjtime() has any effect on your ability to maintain correctness.
- The NTP local clock does what it is told, no more and no less.
-
- (3) I think the specified NTP local clock is (should be, I have to trust
- Dave's math for this) unconditionally stable for all input. Note
- that "stable" in this context has a very specific meaning, and may
- not be what you expect. If the NTP local clock wasn't
- unconditionally stable for all input with its current parameters, it
- could be made that way by adjusting those parameters. The stability
- of the local clock is predictable.
-
- The local clock, in both NTP and DTS, takes time offsets from UTC as
- input and attempts to adjust the system clock in response to these, in
- principle to make the offsets smaller. Note that this is a feedback
- loop, the the adjustments which are made to the system clock affect the
- next offset which is input to the local clock.
-
- Now suppose a DTS client has a clock which drifts by 100 ppm. Suppose it
- also manages to obtain offsets every 15 minutes, by exchanges with its
- servers, which represent the difference between the client's system
- clock and UTC accurately to the nanosecond. The client gives an offset
- to the DTS local clock, essentially adjtime(), which slews the clock by
- the amount of the offset, and then sits around waiting for the next
- offset to arrive. Of course, while adjtime() is slewing the clock
- towards 0 offset, the clock's drift is slewing it away at a rate of 90
- ms per 15 minutes.
-
- At the end of fifteen minutes the whole thing will be repeated. Also, as
- adjtime() will typically slew the clock at a much greater rate than 100
- ppm, the system clock offset from UTC will show a sawtooth waveform
- varying between 0 and 90 ms with a period of 15 minutes, and will
- average 45 ms off. So, for absolutely perfect data in, the DTS local
- clock can get the system clock no closer than 45+-45 ms. This is
- inaccurate (it isn't even meaningful to say that you are getting perfect
- data in, either, since the jitter will be fed back as an uncertainty in
- the input). Indeed, this inaccuracy is not unexpected. This is what is
- called a Type I feedback loop (I think), and is known to be unable to
- track an input without introducing a phase error on the output. Worse,
- for perfect data in consumers of time on that machine will see an
- uncertainty of +-90 ms, even though the true uncertainty is only half
- that (0 to 90 ms), and even though the input data is perfect. This is
- gross.
-
- The NTP local clock does much, much better, by a couple of techniques.
- First, the jitter the DTS system clock experiences is essentially
- eliminated by not feeding an entire adjustment to the system clock all
- at once, but rather by making very small adjustments at frequent
- intervals (currently 4 seconds in the spec). The NTP local clock hence
- avoids introducing the sawtooth, the clock changes slowly and smoothly.
- Further, the NTP local clock corrects the average error by computing and
- applying a correction for the drift, by implementing a Type II feedback
- loop. Essentially, for perfect data in, the NTP local clock will
- eventually determine the drift of the system clock and, when it does,
- will maintain the average offset of the system clock at 0. I.e.,
- perfectly accurate. A Type II feedback loop will track a fixed input
- accurately. By applying many little corrections to the system clock
- instead of one big one, NTP will also maintain the system clock
- relatively jitter free (one could say absolutely jitter free in
- comparison to DTS).
-
- Of course, there is no such thing as perfect, but we can begin to assign
- relative orders of magnitude to errors of the DTS and NTP local clock
- schemes (this comparison is deterministic as well, there are no
- probabilities involved). DTS' local clock error is proportional to the
- drift. NTP's local clock error is proportional to the rate of change of
- the drift with respect to time multiplied by the time constant of the
- PLL. In the real world the latter is less than the former by at least
- several orders of magnitude.
-
- I have grossly oversimplified this, but please believe that the details
- are all in the NTP spec if you decode them. Now, how can replacing DTS'
- use of adjtime() with something which is more accurate affect DTS'
- configurability? How could it possibly affect correctness? I stand by my
- original statement, that the issue of accuracy (with respect to the
- local clock) is completely and utterly orthogonal to any issues of
- configuration or management. DTS just didn't include the machinery to
- condition with the local clock (and apparently is hung up on type I
- feedback loops for clock control when there are other better ways to do
- it), and this is what is so frustrating about it.
-
- I also think your claim that using local clock processing which does
- frequency compensation would require one to *increase* the inaccuracy
- must either be wrong, or indicate that DTS is doing something very
- silly. Does it make sense that something which can be analytically shown
- to increase accuracy increases the inaccuracy? Does it make sense that,
- if I tune a crystal for zero drift in hardware with a screwdriver that
- the inaccuracy should decrease, while if I tune it for zero drift in
- software the inaccuracy must increase? The mind boggles.
-
- Further, I think you may still be operating in an unreal world with the
- assertion that a manufacturer could possibly define a maximum drift for
- the clock in a particular model of machines, one that could never be
- exceeded under any circumstances which couldn't be called hardware
- failure. I would suggest to you that the very worst cause of clock drift
- in real systems is not hardware at all, but rather lost clock interrupts
- (which cause large, negative drifts). Would you be willing to certify
- that DEC hardware/operating systems never lose clock interrupts under
- any circumstances? Or provide a guaranteed limit to the number that will
- be lost in any time interval? You can call lost clock interrupts a
- "fault" if you wish, but implying that such "faults" are "historically"
- a rare occurance flies in the face of experience, at least with Unix
- systems.
-
- Which brings up another assertion I am begining to doubt, that NTP does
- not (or cannot) know the inaccuracy of its time estimate. Joe, take a
- look at how NTP's synchronization distance is accumulated. Don't worry
- about the value of this that the spec suggests be used for stratum 1
- servers, let's assume that this is configured for stratum 1 servers in a
- way which agrees with DTS. Look carefully at how the synchronization
- distance a stratum 3 server will receive. Do you see any reason why I
- could not assert that UTC must be contained in an interval which is +-
- 1/2 the synchronization distance from the system clock's time, and prove
- this assertion by the same principles that DTS uses? Or, if not, that
- there are any uncorrectable imperfections in this assertion?
-
- I am hence beginning to think as Dave, that NTP's time could be proven
- to lie within known bounds, just as DTS' can. I see nothing that would
- prevent this. If applications needed to know this, I think it could be
- arranged for NTP to provide it.
-
- The question becomes, then, what is all that statistical junk that NTP
- does? I think the issue that is being missed here (and I'm on thin ice
- again) is that DTS does indeed make some assumptions about probability
- distributions. In particular, all correctness can give you is an
- interval which should include UTC. The system clock, however, cannot be
- set to an interval, it needs a specific value. DTS hence arbitrarily
- assumes that any time in the interval are equally likely to be UTC as
- any other, and hence picks the middle of the interval as this minimizes
- the probable error based on that assumption (the fact that not picking
- the middle of the interval increases the inaccuracy interval
- demonstrates a flaw in the DTS protocol which is also exercised by the
- local clock processing. The protocol demands that the intervals be +-
- something even when it might be known that the true interval is
- +something -something-different. DTS hence claims inaccuracy intervals
- which are often bigger than they should be simply because it lacks the
- ability to represent the true state of affairs. This doesn't affect
- correctness, but reduces the utility of knowing the inaccuracy
- interval).
-
- NTP, however, assumes that the probability distribution over the
- interval is non-uniform, that some times within the interval are more
- likely to be UTC than others (this isn't strictly true, but I see no
- reason why it couldn't be). It proceeds to determine the time within the
- interval which is most likely to be UTC and sets the clock to that. NTP
- does this in part by casting off samples and servers which it thinks are
- less reliable. If DTS can correctly synchonize to a single server,
- however, then casting off servers you aren't fond of can't affect
- correctness. NTP choses servers it likes (and samples from servers it
- likes) based on presumed characteristics of the probability
- distributions of network traffic. This has no effect on correctness,
- however, and in the extremely unlikely event that the network does not
- behave in the way that NTP expects, NTP's choice of UTC will probably be
- no worse than DTS'. This has no effect on NTP's ability (or lack
- thereof) to produce correct bounds on the estimate, in DTS' sense. Also,
- I find it strange that DTS could claim that avoiding having to "accept
- historical estimates of current network performance" is a feature, when
- this is done by making assumptions about probability distributions which
- have no basis in practical reality at all.
-
- A couple of things come to mind. Joe, you mentioned something about
- "wild instabilities" which Dave said NTP might suffer in the face of
- poor server selection, or some such. I would suggest to you that the
- term "wild instabilities" is one which is relevant only in relation to
- one's expectation of the performance of your time protocol. To anyone
- who thinks an error of 45+-45 ms in the time the system clock returns
- when given perfect data is acceptable, NTP is as solid as a rock. NTP's
- "wild instabilites" are only relevant if your expectations are much
- higher than DTS', since "wild" for NTP is a lot smaller than the
- instabilities DTS apparently considers normal.
-
- More than this, I fail to understand the rest of the arguments about why
- NTP couldn't be retrofitted with an autoconfiguration protocol. NTP has
- no configuration rules, it places itself in the heirarchy based on the
- servers available to it. NTP will operate in such a way as to maximize
- the probable accuracy of its time no matter how it is configured. Give
- your NTP daemon a random set of peers and it will chose the best of
- them, adopt a stratum which is appropriate based on the time sources
- available, and make the best use of the time available. Concerns about
- "phase noise" (i.e. jitter) are again based on expectations of the
- performance of the time protocol, an NTP server which takes time from
- your 45+-45 ms host will survive just fine, and indeed will show far
- less jitter than +-45 ms (look at the local clock, high frequency noise
- is damped out). It is just that NTP servers are expected to be a lot
- closer than 45 ms, so your host looks bad compared to NTP's expectations
- (but not needs). Further, the stuff about synchonization loops is
- irrelevant. The NTP protocol survives loops just fine, it's just that
- the consequence of a loop is that the machines involved count their
- statum to infinity and disconnect from the synchronization subnet rather
- than continuing to fool themselves that their servers know something
- they don't. This is quite reasonable behaviour since NTP clocks don't
- drift much when left unsynchronized. I would suggest to you that NTP
- knows far more about Murphy than DTS does at this point, since it has
- been tested in far more environments, on far more machines, in likely
- far harsher environments, through far more revisions, for far longer
- than DTS has. There are three independently done implementations of NTP,
- all of which work well and interoperate. How complex can NTP be? NTP can
- be given a random set of servers and work just fine, thanks. This wasn't
- done with auto-configuration specifically in mind, but rather simply to
- meet robustness requirements. NTP has a lot of real world experience to
- prove that it is robust, and that it is certainly robust in the face of
- even gross misconfiguration. What more is needed for an auto-
- configurable protocol? Autoconfiguration certainly couldn't do a worse
- job than people do.
-
- As for Byzantine failures, you are right that NTP's scheme for leap
- second notifications suffers from this, but what is the worst thing that
- can happen if this occurs? Right, your clock ends up a second off. With
- DTS, however, it is a virtual certainty that the clock will end up a
- second off when a leap second occurs, so criticizing NTP for leaving
- this hole is a little like the pot calling the kettle black. That DTS
- increases its inaccuracy by a second is irrelevant for comparison, since
- NTP doesn't maintain this inaccuracy. If (when) NTP maintains an
- inaccuracy interval it should probably increase it by a second during
- leaps as well. This doesn't help keep your clock accurate across leaps,
- though.
-
- For broadcast time, however, I think this is incorrect. The NTP clock
- selection code includes an agreement protocol, and this is still used
- for broadcast time. To cause a failure one would have to co-opt a
- majority of the servers, and this is hardly less robust than DTS. I can
- see little more exposure to such failures with broadcast time than with
- polled time, and I think we must agree that polled NTP is not less
- robust in the face of Byzantine failures than DTS. Further, if you are
- really worried about hostile attacks on your clients then you'll be
- using authentication anyway, in which case there is no additional
- exposure to such failures.
-
- More than this, note that NTP's multicast time is used in the LAN
- environment. The transit delays here are a few milliseconds, and indeed
- my daemon includes partially implemented code to determine these delays
- on the fly by polling. This transit delay isn't even measureable by a
- lot of machines (between machines with 10 or 20 ms clocks you end up
- computing absurdities like negative round trip delays. Does DTS handle
- this?), so DTS isn't necessarily going to know a whole lot about this
- delay by polling anyway. Now, you are willing to accept a 45+-45 ms
- error in the setting of your clock and a 90 ms inaccuracy, for perfect
- data in, due to the primitive local clock processing that DTS does, why
- the heck not add in another, say, 50 ms or something equally outrageous,
- to the inaccuracy interval for the broadcast and forget about it? The
- chance of it ever exceeding 50 ms (or whatever) is of about the same
- order as, say, lost clock interrupts or a hardware failure ruining your
- assumptions about the maximum local oscillator drift. Call the big delay
- a network "fault" and forget about it.
-
- The real advantage of multicast time is that you can update a large
- number of clients very frequently with next to no traffic. One minute
- updates can allow much greater precision under adverse conditions than,
- say, 15 minute updates in any case, yet the cost of serving 300 hundred
- clients this way drops from hundreds of packets per minute to one packet
- per minute. You may not care about this, since DTS often seems to me to
- be little concerned with accuracy in any case, but some people like it.
- And note that you've already allowed probabilities to creep into your
- inaccuracy interval by assuming one can configure a maximum drift for
- any system (of course, you punt on this by calling violations of the
- assumption "faults"), it seems to me that assumptions concerning one way
- delays across LANs do not increase the probability of the inaccuracy
- being wrong (and, of course, you can always call such cases "faults").
-
- One comment about NCS and Kerberos for authentication. Authentication is
- notorious for increasing code path lengths (possibly asymmetrically) and
- this affects accuracy adversely. This is in part why NTP includes an
- integral authentication protocol, because it is concerned with accuracy
- and integrating the authentication allows it to optimally control the
- damage this does. Because NTP is concerned with accuracy (it is apparent
- that DTS is far less so) I can't ever see the internal cryto-checksum
- code being moved to an external agent (I would object if it was), since
- you can always do a better job, in the real time sense, with this
- incorporated as part of the protocol and coded by someone who is aware
- of the issues. NTP could use help with key management, though.
- Actually, upon rereading this note I find (in addition to it being far
- longer than it should be) that I've taken on to much of a pro-NTP
- debating tone. I apologize in advance. I guess the part that galled me
- is the "not willing to accept historical estimates of current network
- performance" comment as an excuse to ignore NTP's well developed, well
- tested time keeping technology when designing the DTS protocol, when
- what has really been done is to replace NTP's observation-based
- assumptions about the probability distributions of network gunk with
- assumptions concerning probabilities which have no basis in the real
- world and which are made strictly to avoid exposing a defect in DTS'
- representation of the inaccuracy. Dave's simulation results indicate
- that it is indeed the case that you've made DTS work worse in the real
- world than NTP (at keeping the clock correct, which is what I desire
- most from my time keeping protocol, not at computing correct error
- bounds, which I desire less anyway). NTP, with its current-reality based
- assumptions on network characteristics, couldn't do worse than DTS,
- whose assumptions seem based on nothing. Couple this with the stability
- and accuracy that NTP's local clock gives it (and the local clock is
- deterministic, the "historical estimates" fuzz can't even be used to
- justify ignoring this) and you've got a powerful, proven time keeping
- protocol. And DTS ignored all of it. Missing the local clock in
- particular is unforgiveable.
-
- NTP doesn't produce a "correct" inaccuracy, but there doesn't seem to me
- to be any reason which prevents it from doing so. The statistical
- assumptions it makes have no effect on this. If there are consumers who
- would like to know the inaccuracy (I would) I think NTP could provide
- it, perhaps without even changing packet format. And NTP doesn't have an
- autoconfiguration protocol, but it has wide experience with people-
- configuration and you couldn't possible design an auto-configuration
- protocol which does worse than people.
-
- I am sorry for the tone of this, but I can't help but take issue to the
- (apparent in DTS' design) attitude that nothing in NTP was worth looking
- at (from my perspective DTS' treatment of the local clock is
- horrendously primitive and simple minded, for example). I understand
- from your last note, however, that you are maybe growing more sensitive
- to this. If we are to standardize a time protocol, let us make it a good
- one by not ignoring existing experience.
-
- ------------------------------------------------------------------------
-
- 24-MAR-1990 19:15:29.73
- From: Michael Soha LKG1-2/A19 226-7658 <soha@nerva.enet.dec.com>
- To: comuzzi@took.dec.com
- CC: Mills@udel.edu, elb@mwunix.mitre.org, marcus@osf.org
- Subj: Re: More discussion of the differences (and similarities) of
- DTS and NTP.
-
- I realize that, while the NTP local clock processing is not a
- particularly difficult coding exercise, it rests on a foundation
- which is the subject of lots of textbooks and quite a number of
- academic journals. I also realize the presentation in Appendix F of
- the current NTP document certainly does not derive this stuff from
- first principles, since that would require turning the NTP spec
- into another control theory textbook. I can understand the
- derivation in there (though I couldn't have produced it, and could
- verify it only with great difficulty) by virtue of a somewhat
- academic, traditional engineering background (Dave seems to have a
- very academic, traditional engineering background), but I realize
- the stuff may look more than a little opague if no one has ever
- forced you to learn/use that stuff. It is, however, very worth
- while to get a handle on what is in there, at least functionally,
- because this can make orders of magnitude difference in the results
- you get from your time protocol.
-
- If the author is referring to Control Theory I agree that it is the
- subject of many textbooks and journal articles. However, if he's
- alluding to the modeling of quartz oscillators and their stability I
- have to disagree. My references, [2-6], do not use the model employed by
- NTP. Refer to my initial discussion on training. Aside, my NTP
- documentation, RFC 1119, does not have an Appendix F. I assume the
- author is referring to Section 5, 'Local Clocks' [Ref 1].
-
- Speaking for myself, I took several courses in Control Theory, not so
- much because I was "forced", but simply because I enjoyed the subject.
-
- Let me a number of assertions, some of which I'm not going to be able to
- support here, but which can be discovered by looking very carefully at
- the local clock description. I think you will find that it is not what
- you think.
-
- (1) The NTP local clock does for NTP (and potentially for DTS) what
- adjtime() does for DTS. It is essentially a procedure which is
- called with a time offset as an argument, and which does
- something to the system clock as a result.
-
- I agree. Both NTP and DTSS are time synchronization protocols.
-
- (2) Like adjtime(), the NTP local clock is fully deterministic.
- There are no probability considerations here. When you give it
- a time offset, the effect this has on the system clock is fully
- predictable. Hence the NTP local clock can have no effect on
- your ability to maintain correctness, any more than the
- behaviour of adjtime() has any effect on your ability to
- maintain correctness. The NTP local clock does what it is told,
- no more and no less.
-
- Deterministic systems can be wrong. I don't think Joe was trying trying
- to say that NTP is indeterministic (God doesn't play dice with NTP). I
- think what he reacting to was the lack of a correctness proof and a
- definition for correctness. As discussed previously, I believe the NTP
- clock model is unsound.
-
- (3) I think the specified NTP local clock is (should be, I have to
- trust Dave's math for this) unconditionally stable for all
- input. Note that "stable" in this context has a very specific
- meaning, and may not be what you expect. If the NTP local clock
- wasn't unconditionally stable for all input with its current
- parameters, it could be made that way by adjusting those
- parameters. The stability of the local clock is predictable.
-
- Reading this paragraph I see the author saying NTP should be stable and
- if it isn't we can fix it. Referring to 'Modern Control Engineering' by
- Ogata [Ref 10], pg 7 "From the point of view of stability, the open loop
- control system is easier to build since stability is not a major
- problem. On the other hand, stability is always a major problem in the
- closed loop system since it may tend to overcorrect errors which cause
- oscillations of constant or changing amplitude". My interpretation of
- stability is just that, no oscillations. See Ogata, pg 217 'Absolute
- stability, relative stability, and steady state error' for additional
- discussion. Furthermore, while we are on the topic of stability, one has
- to acknowledge that a type II system is more apt to be unstable than a
- type I system. Again from Ogata, pg 284, "As the type number is
- increased, accuracy is improved; however, increasing the type number
- aggravates the stability problem. A compromise between steady state
- accuracy and relative stability is always necessary". There are no easy
- answers in this world. Is the NTP clock model the proper choice in terms
- of accuracy and relative stability?
-
- In summary, I agree with only one of Dennis' assertions, NTP and DTSS
- are designed to synchronize clocks in a computer network.
-
- The local clock, in both NTP and DTS, takes time offsets from UTC as
- input and attempts to adjust the system clock in response to these, in
- principle to make the offsets smaller. Note that this is a feedback
- loop, the the adjustments which are made to the system clock affect the
- next offset which is input to the local clock.
-
- Now suppose a DTS client has a clock which drifts by 100 ppm.
- Suppose it also manages to obtain offsets every 15 minutes, by
- exchanges with its servers, which represent the difference between
- the client's system clock and UTC accurately to the nanosecond. The
- client gives an offset to the DTS local clock, essentially
- adjtime(), which slews the clock by the amount of the offset, and
- then sits around waiting for the next offset to arrive. Of course,
- while adjtime() is slewing the clock towards 0 offset, the clock's
- drift is slewing it away at a rate of 90 ms per 15 minutes. At the
- end of fifteen minutes the whole thing will be repeated. Also, as
- adjtime() will typically slew the clock at a much greater rate than
- 100 ppm, the system clock offset from UTC will show a sawtooth
- waveform varying between 0 and 90 ms with a period of 15 minutes,
- and will average 45 ms off. So, for absolutely perfect data in, the
- DTS local clock can get the system clock no closer than 45+-45 ms.
- This is inaccurate (it isn't even meaningful to say that you are
- getting perfect data in, either, since the jitter will be fed back
- as an uncertainty in the input). Indeed, this inaccuracy is not
- unexpected. This is what is called a Type I feedback loop (I
- think), and is known to be unable to track an input without
- introducing a phase error on the output. Worse, for perfect data in
- consumers of time on that machine will see an uncertainty of +-90
- ms, even though the true uncertainty is only half that (0 to 90
- ms), and even though the input data is perfect. This is gross.
-
- I agree that if one has a very poor oscillator, it will drift on the
- order of 90 msec/15 minutes. However, before I apply the NTP clock model
- and expect to drive the stability to one part in 10**8, I'd first
- consider what is actually possible (ie, technically sound). Another
- aside, the statement "..UTC accurately to the nanosecond" is meaningless
- for several reasons. First, the current state of art for time transfer
- (using the Global Positioning System, GPS) can achieve a precision of a
- few nanoseconds Ref [5,8]. This precision is achieved through the use of
- cesium clocks, very precise position information, and long integration
- times. It is unreasonably, to say the least, that one could expect to
- achieve this precision in a computer network. Secondly, the use of UTC
- (as defined by the CCIR, Ref 9) states that '... UTC enables events to
- be determined with an uncertainty of 1 us;'. If you are looking at time
- transfer with a precision on the order of nanoseconds, one must specify
- Timing Center eg, UTC(NIST) (see Ref 11 for a description). The term
- Type I feedback is a reference to the steady state error characteristics
- for a control system. Saying that a Type I is unable to track an input
- without without error is wrong (see Ref 10 ppg 283-292) A Type I system
- can track a step input with ZERO steady-state error. For ramp input it
- will a finite error, which can be reduced by tailoring the 7** system. A
- Type II system can track a ramp input with zero steady 7** state error.
- As discussed earlier in response '5**' a proper solution is 7** always a
- compromise between several parameters (eg, stability, steady 7** state
- error, transient errors, over-shoot, time to first zero, etc...) 7** In
- other words, a type II system isn't always better than a type I.
-
- The NTP local clock does much, much better, by a couple of
- techniques. First, the jitter the DTS system clock experiences is
- essentially eliminated by not feeding an entire adjustment to the
- system clock all at once, but rather by making very small
- adjustments at frequent intervals (currently 4 seconds in the
- spec). The NTP local clock hence avoids introducing the sawtooth,
- the clock changes slowly and smoothly. Further, the NTP local clock
- corrects the average error by computing and applying a correction
- for the drift, by implementing a Type II feedback loop.
- Essentially, for perfect data in, the NTP local clock will
- eventually determine the drift of the system clock and, when it
- does, will maintain the average offset of the system clock at 0.
- I.e., perfectly accurate. A Type II feedback loop will track a
- fixed input accurately. By applying many little corrections to the
- system clock instead of one big one, NTP will also maintain the
- system clock relatively jitter free (one could say absolutely
- jitter free in comparison to DTS).
-
- For perfect data in, the NTP local clock MODEL may be able to reduce the
- offest to zero. However, as for the actual quartz oscillator, it
- impossible. There will always be short-term drift, as discussed in the
- beginning of this memo. As for, Type II vs Type I, it is naive to say
- that a Type II system is always better than a Type I. What is better?
- less stable? more accurate? more transient errors?
-
- Of course, there is no such thing as perfect, but we can begin to
- assign relative orders of magnitude to errors of the DTS and NTP
- local clock schemes (this comparison is deterministic as well,
- there are no probabilities involved). DTS' local clock error is
- proportional to the drift. NTP's local clock error is proportional
- to the rate of change of the drift with respect to time multiplied
- by the time constant of the PLL. In the real world the latter is
- less than the former by at least several orders of magnitude.
-
- It is wrong to say that: DTS' local clock error is proportional to to
- drift. The inaccuracy will grow at a rate equal to the maximum drift.
- The test data to date indicates that most VAX oscillator are much better
- than 1 part in 10**4. So the difference between two DTS local clocks
- will be much better than the max drift. So why use 1 part in 10**4, that
- number accounts for all expected drift components: initial offset; aging
- for the lifetime of the product, 10 years; and environment (temperature,
- humidity, voltage supply, etc).
-
- I have a real problem with the quote 'several orders of magnitude'. I
- have seen in several discussions on NTP that it can achieve stabilities
- on the order of 1 part in 10**8. All of the data that I have seen on
- quartz oscillators [Ref 2-7] indicate that is impossible for
- uncompensated oscillators to reach a stability of one part in 10**8.
-
- I have grossly oversimplified this, but please believe that the
- details are all in the NTP spec if you decode them. Now, how can
- replacing DTS' use of adjtime() with something which is more
- accurate affect DTS' configurability? How could it possibly affect
- correctness? I stand by my original statement, that the issue of
- accuracy (with respect to the local clock) is completely and
- utterly orthogonal to any issues of configuration or management.
- DTS just didn't include the machinery to condition with the local
- clock (and apparently is hung up on type I feedback loops for clock
- control when there are other better ways to do it), and this is
- what is so frustrating about it.
-
- If one adds complexity to increase the accuracy of a system, than it may
- result in additional management and/or configurability. In fact I'd
- argue that this is true more often than not. Saying the two are
- completely and utterly orthogonal is overstating your case.
-
- Personally I like control theory, mainly because it is a challenge to do
- the right thing. The question of Type I vs Type II is (for me at least)
- unrelated to why I don't like the NTP clock model. I simply believe that
- it is technically unsound to 'train' an uncompensated 12** oscillator
- [Ref 2-6].
-
- I also think your claim that using local clock processing which
- does frequency compensation would require one to *increase* the
- inaccuracy must either be wrong, or indicate that DTS is doing
- something very silly. Does it make sense that something which can
- be analytically shown to increase accuracy increases the
- inaccuracy? Does it make sense that, if I tune a crystal for zero
- drift in hardware with a screwdriver that the inaccuracy should
- decrease, while if I tune it for zero drift in software the
- inaccuracy must increase? The mind boggles.
-
- The reason you may want to consider increase the inaccuracy is for
- robustness. That is, if the correction is correct 99% of the time, then
- for the 1% of the time that the correction is incorrect, you will still
- contain UTC in the time interval.
-
- Further, I think you may still be operating in an unreal world with
- the assertion that a manufacturer could possibly define a maximum
- drift for the clock in a particular model of machines, one that
- could never be exceeded under any circumstances which couldn't be
- called hardware failure. I would suggest to you that the very worst
- cause of clock drift in real systems is not hardware at all, but
- rather lost clock interrupts (which cause large, negative drifts).
- Would you be willing to certify that DEC hardware/operating systems
- never lose clock interrupts under any circumstances? Or provide a
- guaranteed limit to the number that will be lost in any time
- interval? You can call lost clock interrupts a "fault" if you wish,
- but implying that such "faults" are "historically" a rare
- occurrence flies in the face of experience, at least with Unix
- systems.
-
- Stating a maximum number for drift can be done if one accounts for all
- of the contributors to drift (short and long term). One only needs to
- examine the oscillator specification. The main contributors are: aging,
- initial accuracy and temperature stability. Accounting for these
- contributors one can easily show that a stability of 1 part in 10**4 is
- a proper choice (assuming a lifetime 10 years). I agree that accounting
- for missing clock interrupts is a very difficult problem. Does NTP have
- a solution for this?
-
- Which brings up another assertion I am beginning to doubt, that NTP
- does not (or cannot) know the inaccuracy of its time estimate. Joe,
- take a look at how NTP's synchronization distance is accumulated.
- Don't worry about the value of this that the spec suggests be used
- for stratum 1 servers, let's assume that this is configured for
- stratum 1 servers in a way which agrees with DTS. Look carefully at
- how the synchronization distance a stratum 3 server will receive.
- Do you see any reason why I could not assert that UTC must be
- contained in an interval which is +-1/2 the synchronization
- distance from the system clock's time, and prove this assertion by
- the same principles that DTS uses? Or, if not, that there are any
- uncorrectable imperfections in this assertion?
-
- I agree that NTP may be able to provide an inaccuracy. But you may need
- additional data to account for processing delays and the local clock
- resolution for each NTP node from the stratum 1 server.
-
- The question becomes, then, what is all that statistical junk that
- NTP does? I think the issue that is being missed here (and I'm on
- thin ice again) is that DTS does indeed make some assumptions about
- probability distributions. In particular, all correctness can give
- you is an interval which should include UTC. The system clock,
- however, cannot be set to an interval, it needs a specific value.
- DTS hence arbitrarily assumes that any time in the interval are
- equally likely to be UTC as any other, and hence picks the middle
- of the interval as this minimizes the probable error based on that
- assumption (the fact that not picking the middle of the interval
- increases the inaccuracy interval demonstrates a flaw in the DTS
- protocol which is also exercised by the local clock processing. The
- protocol demands that the intervals be +-something even when it
- might be known that the true interval is +something -something-
- different. DTS hence claims inaccuracy intervals which are often
- bigger than the? should be simply because it lacks the ability to
- represent the true state of affairs. This doesn't affect
- correctness, but reduces the utility of knowing the inaccuracy
- interval).
-
- See my response [above] and my initial discussion on training. DTS is
- not optimal since it balances the inaccuracy about the midpoint. One
- need not use a balance interval by providing three datapoints: time,
- +inacc, and -inacc; however we decided that this optimization was beyond
- the point of diminishing returns.
-
- NTP, however, assumes that the probability distribution over the
- interval is non-uniform, that some times within the interval are
- more likely to be UTC than others (this isn't strictly true, but I
- see no reason why it couldn't be). It proceeds to determine the
- time within the interval which is most likely to be UTC and sets
- the clock to that. NTP does this in part by casting off samples and
- servers which it thinks are less reliable. If DTS can correctly
- synchronize to a single server, however, then casting off servers
- you aren't fond of can't affect correctness. NTP choses servers it
- likes (and samples from servers it likes) based on presumed
- characteristics of the probability distributions of network
- traffic. This has no effect on correctness, however, and in the
- extremely unlikely event that the network does not behave in the
- way that NTP expects, NTP's choice of UTC will probably be no worse
- than DTS'. This has no effect on NTP's ability (or lack thereof) to
- produce correct bounds on the estimate, in DTS' sense. Also, I find
- it strange that DTS could claim that avoiding having to "accept
- historical estimates of current network performance" is a feature,
- when this is done by making assumptions bout probability
- distributions which have no basis in practical reality at all.
-
- 'most likely to be UTC' describes why NTP is different form DTSS. NTP
- focuses on accuracy while DTSS main goal is to always include UTC in the
- computed time interval. Neither one is necessarily better than the
- other.
-
- As described in Dennis' memo and my responses, NTP uses a different
- control mechanism than DTS (eg type II vs Type I). Given this, it is
- difficult to accept the claim that NTP choice will be no worse than
- DTSS'.
-
- Two points. 1. Historical estimates of network performance may be stale
- due to changes in the network layer (different routes) and the datalink
- layer (changes in the bridge topology). How does one ensure that the
- network performance estimates reflect the current state of the affairs.
- 2. The statement the 'DTS's assumption have no basis in reality' is
- false. See response **14 and my initial discussion on reality.
-
- A couple of things come to mind. Joe, you mentioned something about
- "wild instabilities" which Dave said NTP might suffer in the face
- of poor server selection, or some such. I would suggest to you that
- the term "wild instabilities" is one which is relevant only in
- relation to one's expectation of the performance of your time
- protocol. To anyone who thinks an error of 45+-45 ms in the time
- the system clock returns when given perfect data is acceptable, NTP
- is as solid as a rock. NTP's "wild instabilites" are only relevant
- if your expectations are much higher than DTS', since "wild" for
- NTP is a lot smaller than the instabilities DTS apparently
- considers normal.
-
- I thought this comment originally came from Dave (wrt DTSS giving time
- to NTP). It may have been overstated, however the point remains that it
- is more difficult to ensure stability in a Type II system (as compared
- to a type I).
-
- More than this, I fail to understand the rest of the arguments about why
- NTP couldn't be retrofitted with an autoconfiguration protocol. NTP has
- no configuration rules, it places itself in the heirarchy based on the
- servers available to it. NTP will operate in such a way as to maximize
- the probable accuracy of its time no matter how it is configured. Give
- your NTP daemon a random set of peers and it will chose the best of
- them, adopt a stratum which is appropriate based on the time sources
- available, and make the best use of the time available. Concerns about
- "phase noise" (i.e. jitter) are again based on expectations of the
- performance of the time protocol, an NTP server which takes time from
- your 45+-45 ms host will survive just fine, and indeed will show far
- less jitter than +-45 ms (look at the local clock, high frequency noise
- is damped out). It is just that NTP servers are expected to be a lot
- closer than 45 ms, so your host looks bad compared to NTP's expectations
- (but not needs). Further, the stuff about synchonization loops is
- irrelevant. The NTP protocol survives loops just fine, it's just that
- the consequence of a loop is that the machines involved count their
- statum to infinity and disconnect from the synchronization subnet rather
- than continuing to fool themselves that their servers know something
- they don't. This is quite reasonable behaviour since NTP clocks don't
- drift much when left unsynchronized. I would suggest to you that NTP
- knows far more about Murphy than DTS does at this point, since it has
- been tested in far more environments, on far more machines, in likely
- far harsher environments, through far more revisions, for far longer
- than DTS has. There are three independently done implementations of NTP,
- all of which work well and interoperate. How complex can NTP be? NTP can
- be given a random set of servers and work just fine, thanks. This wasn't
- done with auto-configuration specifically in mind, but rather simply to
- meet robustness requirements. NTP has a lot of real world experience to
- prove that it is robust, and that it is certainly robust in the face of
- even gross misconfiguration. What more is needed for an auto-
- configurable protocol? Autoconfiguration certainly couldn't do a worse
- job than people do.
-
- As for Byzantine failures, you are right that NTP's scheme for leap
- second notifications suffers from this, but what is the worst thing
- that can happen if this occurs? Right, your clock ends up a second
- off. With DTS, however, it is a virtual certainty that the clock
- wil end up a second off when a leap second occurs, so criticizing
- NTP for leaving this hole is a little like the pot calling the
- kettle black. That DTS increases its inaccuracy by a second is
- irrelevant for comparison, since NTP doesn't maintain this
- inaccuracy. If (when) NTP maintains an inaccuracy interval it
- should probably increase it by a second during leaps as well. This
- doesn't help keep your clock accurate across leaps, though.
-
- The statement that 'NTP should increase it's inaccuracy' implies that
- DTSS is doing the right thing. Am I confused?
-
- For broadcast time, however, I think this is incorrect. The NTP
- clock selection code includes an agreement protocol, and this is
- still used for broadcast time. To cause a failure one would have to
- co-opt a majority of the servers, and this is hardly less robust
- than DTS. I can see little more exposure to such failures with
- broadcast time than with polled time, and I think we must agree
- that polled NTP is not less robust in the face of Byzantine
- failures than DTS. Further, if you are really worried about hostile
- attacks on your clients then you'll be using authentication anyway,
- in which case there is no additional exposure to such failures.
- More than this, note that NTP's multicast time is used in the LAN
- environment. The transit delays here are a few milliseconds, and
- indeed my daemon includes partially implemented code to determine
- these delays on the fly by polling. This transit delay isn't even
- measureable by a lot of machines (between machines with 10 or 20 ms
- clocks you end up computing absurdities like negative round trip
- delays. Does DTS handle this?), so DTS isn't necessarily going to
- know a whole lot about this delay by polling anyway. Now, you are
- willing to accept a 45+-45 ms error in the setting of your clock
- and a 90 ms inaccuracy, for perfect data in, due to the primitive
- local clock processing that DTS does, why the heck not add in
- another, say, 50 ms or something equally outrageous, to the
- inaccuracy interval for the broadcast and forget about it? The
- chance of it ever exceeding 50 ms (or whatever) is of about the
- same order as, say, lost clock interrupts or a hardware failure
- ruining your assumptions about the maximum local oscillator drift.
- Call the big delay a network "fault" and forget about it.
-
- How do you account for changes in the bridged LAN and/or remote bridges.
- Most LANs or bridged, a change in topology will affect the network
- delay. Furthermore, if one has a remote bridge, say at 56 kbps, then for
- each minimum size packet (64 bytes) another 5 msec of ONEWAY delay (ie,
- asymmetrical) will be introduced. We have measured ONEWAY delays on LANs
- (with only local bridges) as high as 100 msec.
-
- As discussed in previous comments I do not agree that NTP will work the
- same as DTSS. As for the local clock, I do not believe that it is
- technically sound to 'train' an uncompensated clock (see discussion on
- training at beginning of this memo.)
-
- I am sorry for the tone of this, but I can't help but take issue to
- the (apparent in DTS' design) attitude that nothing in NTP was
- worth looking at (from my perspective DTS' treatment of the local
- clock is horrendously primitive and simple minded, for example). I
- understand from your last note, however, that you are maybe growing
- more sensitive to this. If we are to standardize a time protocol,
- let us make it a good one by not ignoring existing experience.
-
- To be frank, I would say that NTP's view that an uncompensated can be
- 'trained' to one part in 10**8 has no foundation in the scientific
- literature.
-
- REFERENCES
-
- 1. Mills, D. L., "Network Time Protocol (Version 2) Specification and
- Implementation", RFC: 119, University of Delaware, September 1989
-
- 2. Vig, J. R., "Quartz Crystal Resonators & Oscillators For Frequency
- Control and Timing Applications", SLCET-TR-88-1,US Army Electronics
- Technology and Devices Laboratory, Fort Momouth, New Jersey,
- January 1988.
-
- 3. Bottom, V. E., "Introduction to quartz Crystal Unit Design", Van
- Nostrand Reinhold Electrical/Computer Science and Engineering
- Series, New York, 1982
-
- 4. Frerking. M. E., "Crystal Oscillator Design and Temperature
- Compensation", Van Nostrand Reinhold Company/Litton Educational
- Publishing, 1978
-
- 5. NIST, "Time and Frequency Seminar - June 14, 15, 16 1988" Time and
- Frequency Division, NIST, Boulder Colorado
-
- 6. NIST, "Time and Frequency: Theory and Fundamentals", NBS Monograph
- 140, SD Catalog No. C13.44:140., Boulder, Colorado.
-
- 7. VECTRON, "Crystal Oscillators 1989", VECTRON Laboratories, Inc,
- Norwalk Connecticut.
-
- 8. Imae, M. et al, "A dual frequency GPS receiver measuring ionspheric
- effects without code demodulation and its application to time
- comparisons", Proceedings of the 20th Annual Precise Time and Time
- Interval (PTTI) Applications and planning Meeting, Vienna, Virgina,
- 1988.
-
- 9. CCIR, "Recommendations and Reports of the CCIR, 1986 - Standard
- Frequencies and Time Signals", XVIth Plenary, Dubrovnik, 1986.
-
- 10. Ogata, K., "Modern Control Enginnering", Prentice Hall Inc,
- Englewood Cliffs, New Jersey, 1970.
- 11. NIST, "Time & Frequency Bulletin No. 388 March 1900", NISTR 90-
- 3940-3 (a monthly report from NIST), Time and Frequency Division,
- NIST, Boulder, Colorado.
-
- ------------------------------------------------------------------------
-
- Date Tue, 27 Mar 90 4:59:06 GMT
- From Mills@udel.edu
- To comuzzi@took.enet.dec.com
- cc mills@udel.edu, dennis@gw.ccie.utoronto.ca
- Subject? Re? I've sent this to everyone else, yours bounced because of a
- typo.
-
- ... This is a note to continue the DTS/NTP comparison, because I
- too am finding this conversation fruitful. Dennis, I would be glad
- to mail you a copy of the DTS architecture if you want one. (and
- Dave, I've even changed the cover and introduction). ...
-
- Yeah, I've learned something, too. Are you game to cast the document as
- an RFC? It's too bad we didn't schmooze while the stove was still
- simmering the kettle. Is the kettle still warm?
-
- ... Allow me to start with the decision of DTS not to support a
- multicast mode. One reason was that protocols which multicast the
- time will be subject to Byzantine failures. ...
-
- There are two issues here, one pragmatic and the other hidden. Since NTP
- multicast clients may enjoy multiple NTP multicast servers on the same
- wire, Byzantine vulnerabilities are reduced. The only thing you lose is
- the synchronization delay (aka inaccuracy interval - darn, we should
- have both called that the "confidence interval"), which in Unix
- community is imperceptable. The hidden agenda is to explore the utility
- of the new IP multicast capability, which is quickly becoming ubiquitous
- in the Internet R&D community. While I would like to make the case that
- multicasting should be supported on its own merits, I have great
- nervousness about such scenarios as CMU, which as you probably know,
- runs what could be called a godzilla network of intertwined wires and
- bridges. I am told that each of several NTP servers now hark upwards of
- 500 clients each. If each one of those clients dudes expects to rattle
- chimes of, say, three servers each, the induced RF field might misdirect
- planes 50 miles away.
-
- ... The second point, the one I was trying to raise in my reply to
- Dave, was that it was DTS's intention to have the time and interval
- information available on every node.
-
- The inaccuracy interval is available in NTP, too, but a Unix interface
- is not. While not specified in the NTP spec, a clamp should be placed on
- the frequency compensation term, like +-20 ppm or something like that.
- It would then be possible to make almost the same confidence statements
- about NTP as for DTS. The "almost" is because of the basic difference
- between the NTP selection/combining algorithm and the DTS intersection
- algorithm. These issues need to be discussed at another time.
-
- ... The third point I'd like to discuss refers to Dave's statement
- about a "considerable Internet constituency which has noisily
- articulated the need for a multicast function when the number of
- clients on the wire climbs to the hundreds." Is it that they wanted
- multicast? Or was the real objection the practical difficulty of
- adding a second server to a LAN of 300 nodes and then having to
- change the server entry in 150 ntp.conf files to redistribute the
- load? ...
-
- The quote is quite correct. A number of dudes jumped on me to include
- multicasting in the spec. Their perception is more concerned with
- network load than with correctnes; however, I readily admit they might
- not have yet become sensitive to the configuration issues you raise.
- However, I continue to believe the autoconfiguration issue transcends
- NTP and should be considered in a wider context.
-
- ... The next topic I'll discuss is the area of the DTS architecture
- I personally believe is least likely to survive the standardization
- process unchanged: The timestamp format. I think we have much
- agreement here.
-
- I still have a few scars left from old Internet wars on this point,
- including the use of binary versus character-oriented formats and
- whether leap seconds are meaningful. The fact that leap seconds cannot
- be reliably predicted seems to be a showstopper. As long as you must
- have institutional memory for them, it may be easiest to include epoch
- era information using the same mechanism.
-
- ... There has been a fair amount of discussion about interoperation
- of the two time services. Let me try to clarify what I said (too
- tersely) in my original response to Dave's paper. There are three
- separate cases I'd like to distinguish A) An isolated group of DTS
- systems which obtain time from NTP B) An isolated group of NTP
- systems which obtain time from DTS. C) A collection of DTS and NTP
- giving time to each other.
-
- I believe that without fairly major changes in one or both of the
- architectures case C is a problem, however it is an easily
- prevented problem (see below). Cases A and B however are very
- interesting, useful and I claim easily achievable.
-
- I see no problem with A and B either, even to the point of equating
- synchronization distance to inaccuracy interval. NTP would have to
- assign a stratum number to a DTS client and DTS might want to mark the
- synchronization distance provided by NTP as a "possibly unreliable
- inaccuracy interval." You have enough bits in the DTS timestamp to even
- do that, as well include leap warnings. I would even suggest amending
- the NTP spec to include such an interface specification and the DTS spec
- to include an identifier for the primary reference source. For this
- reason and in order to suppress neighbor loops, NTP includes the
- synchronization source in the header.
-
- ... Case C potentially breaks the invariants of both protocols. The
- DTS invariant is that UTC is contained within the interval. The NTP
- invariant (I'm less sure of my statement here) is that the
- frequency of good servers agree with UTC.
-
- In fact, the intent is to phase-lock all the clocks to UTC, which means
- both in frequency and time. This is not so much an invariant as a goal;
- although in practice it is achieved in much the same fashion as the
- power grid keeps your electric clocks humming UTC. Yeah, I tried that
- too and investigated whether the power grid itself makes a usable
- time/frequency transfer medium. Even though the guys in Columbus run the
- eastern-divide grid from a WWVB clock, local utilities drop off the grid
- from time to time and do not feel it necessary to maintain phase
- continuity, but that's another story among many other hilarities to be
- shared at another time.
-
- ... NTP has a further invariant, that there are no loops in the
- time distribution network. This is enforced by the stratum. Clearly
- if DTS took time from a collection of NTP servers and later gave it
- back to the same collection of servers, a loop could and probably
- would occur. There is a simple method to prevent this, I propose
- that the gateway described in case B above always declare itself to
- be at some fairly high (numerically large) stratum. Potential
- clients will ignore the DTS/NTP server in favor of servers which
- obtained their time exclusively via NTP (and have much lower
- stratum numbers). I'm assuming that the NTP implementation at the
- gateway can be coerced into using a fixed stratum and would propose
- a value of 16 for this purpose.
-
- This is a useful approach and requires only minor mods to the NTP spec.
- However, please don't underestimate the importance of the stratum, which
- is useful to avoid instabilities such as spurious clockhopping and
- loops.
-
- ... There's also a stratum zero which is supposed to be used when
- the stratum is unknown, however I'm not sure I understand what
- value servers which obtain their time from a stratum zero server
- will use. Do they use zero? If so, how are loops prevented amongst
- themselves? ...
-
- Stratum zero means "undefined," which can mean a lot of things, usually
- that the client has not yet synchronized. NTP peers will not synchronize
- to a stratum-zero peer, but will run the protocol so the peer can get
- synchronized.
-
- ... A few nits before we get to the meat of the discussion. Dennis
- is concerned that the drift has to be input as a management
- parameter. I'll show my vendor colors here and say this isn't a
- management parameter but an implementation detail.
-
- Dennis is not talking about the max frequency offset (I have problems
- with "drift," since in the communications field this means a change in
- frequency with time.), but with the estimated frequency offset produced
- by the local-clock algorithm, which is usually much lower. It can take
- an NTP peer some days to finetune the frequency, compliance and whatnot
- to produce stabilities in the .01-ppm range, so implementations can
- reduce the time to converge by remembering the offset and recovering it
- on reboot.
-
- The problem I have with arbitrary max's is that they are quartz-centric.
- You can certainly stamp the nameplate with the oscillator tolerance and
- expect it to be maintained throughout the service life of the equipment,
- but I sure wouldn't want to rely on the nameplate in our shop where
- machines are gloriously cannibalized and CPU boards routinely swapped.
- You could, of course, stash info like this in lithium along with the
- Ether address and license serial, but I doubt too many would take it
- seriously. Nevertheless, the same thing you say about DTS applies to
- NTP, assuming the clamp I mentioned previously is added to the frequency
- compensation. The nameplate would then specify the sum of the quartz
- tolerance plus the clamp.
-
- ... The second nit has to do with DTS's treatment of leap seconds.
- I appear not to have been clear here. Dave's original document was
- basically correct in its description of how DTS handles leap
- seconds - Servers increase their inaccuracy at the month boundary
- and a time provider narrows the interval later. When I wrote: "Each
- server has to maintain and propagate this state before the leap
- insertion. This is, of course, subject to Byzantine failures. A
- failing server can insert a bad notification." I was describing my
- understanding of (and a problem with) NTP's leap second handling.
- If my understanding of NTP is incorrect, I apologize, but the
- Byzantine problem seems real to me.
-
- I still am unsure how to incorporate a leap second into a prolific DTS
- subnet, if I can use the term. I assume all the DTS gizmos in the world
- will ramp up their inaccuracy intervals by one second at the end of
- every month. Let's say a leap second occurs and is eventually recognized
- by most of the radios (all extant radios sail right through a leap, only
- to lose synchronization later and then recover it). This takes a few
- minutes to a few hours. Servers and clerks will discover this fact from
- two to fifteen minutes later. While it is true that the inaccuracy
- interval will be correctly maintained, it may come as a shock to some
- users that for some uncertain period their timestamps suddenly took a
- one-second hit in accuracy. Assuming the time providers are told, either
- by radio or keyboard, that a leap is nigh, it would be possible to
- remove this ambiguity by stealing a bit in the timestamp format.
-
- ... Dave asked if I am in substantial agreement with the
- statistical models presented in his first document. I agree with
- most of this section. My only significant disagreement is with the
- last paragraph. It is true that DTS assumes that a system's clock
- drifts at a rate less than its manufacture's specification, and
- that a hardware time provider operates within specification. The
- probablities of these assumption being false are on the order of
- magnitude of other hardware failures. Software implementation do
- not routinely checksum memory any more (and they certainly don't do
- it to find memory errors). Violations of these assumptions
- represent faults, just as real as processor faults, and should be
- fixed. Note the long tails you observe in the distributions in the
- Internet are on message transmission times and the like. These
- parameters are dynamically measured in the DTS algorithm. Wick
- Nichols stated: "DTS is willing to accept historical estimates of
- the probability that a clock will go faulty (with checks for
- faultiness), but is not willing to accept historical estimates of
- current network characteristics." in his discussion of this point
- for the OSF.
-
- I'm struggling for a way to state my position in the most compact way I
- can, while still being fair to both the DTS and NTP models. I think we
- both agree that there is a tradeoff between accuracy and correctness.
- There will always be some tails in the error distributions for time
- providers, servers and network paths, as we have amply demonstrated over
- the years. Dennis' comments about missed clock interrupts are on the
- mark, as well as pragmatic mysteries on reading clocks in real operating
- systems. Your example of log coordination is a good one, as I have been
- using NTP for several years doing just that. However, in my battles with
- the old NSFNET Phase-I backbone, it was essential that transcontinental
- events (on the backbone) could be tagged accurately within ten
- milliseconds or so. Even today I expect NTP to correctly tag the
- Norwegian atomic clock to within half a second, in spite of gross
- misconduct across the Atlantic, as you may have seen. I thus see neither
- the statistical approach of NTP nor the correctness approach of DTS as
- necessarily "right," just different.
- I did a little experiment you might enjoy. Using the simulator mentioned
- previously and first the NTP and then the DTS selection algorithms, I
- purposely wiggled the offset of one of three clocks from nominal to a
- couple of seconds off, the idea being to create a falseticker in
- gradually increasing steps. The inaccuracy interval in both cases was
- calculated as in DTS, but without a time dependence, since the intervals
- between updates are small and the residual frequency offset is very
- small. While I hardly have enough data to make a definitive judgement, I
- can say that NTP quickly tossed out the bad clock, while DTS hung on for
- dear life rather longer than I would expect. I intend to play with this
- some more.
-
- My experiment pointed out a possibly noxious issue. I was at pains to
- make sure the inaccuracy interval was computed correctly, starting from
- the primary reference clocks that were in fact peers of the Norwegian
- chimer. However, the path is quite noisy, with effect the customer
- receiving the time-inaccuracy stamps can get wildly differing inaccuracy
- intervals on successive samples. It would seem the DTS customer would
- have to accumulate a number of samples if only to make sure the
- inaccuracy interval was reliable. In principle, this is the same
- strategy you suggest for time providers. I don't see anything
- necessarily wrong with this, but it does demonstrate that escape from
- probabilistic mechanics probably contradicts the third law of
- thermodynamics.
-
- ... Dennis asked a question about DTS authentication in the
- Internet environment. What I personally would like to see is an
- implementation of DTS using Apollo's NCS which in turn used
- Kerberos authentication. This is basically what Digital has
- proposed to the OSF in response to their distributed computing
- request for technology.
-
- My friends the electric spooks tell me Kerberos has real conceptual
- problems and that we should salute SNDS instead. Believe it when they
- tell us how to implement KMP and Firefly. Be advised it takes more than
- 100 ms to calculate the NTP cryptosum in an LSI-11/73 (yeah, I know I
- deserve that) and this cannot be compensated unless the protocol can
- measure and adjust the timestamps accordingly (my ISOfriends are much
- aggravated by that position).
-
- ... Now to the major contention of Dennis's review, that accuracy
- and ease-of-management are "completely and utterly orthogonal". I
- disagree with this less then a reader of my response to Dave might
- think, though I am somewhat in disagreement with it. What I hold is
- that ease-of-management, provablility and accuracy for a time
- service are all interrelated.
-
- Those guys who actually do mount and run large NTP subsubnets (Merit
- runs 150 chimers in the NSFNET backbone alone, most of which have
- identical configuration files) can speak eloquently about their own
- hardships. That's not to say I don't believe you, just that others
- should make the NTP case.
-
- ... The problem with just adding the NTP local clock model to DTS
- (as I understand the NTP local clock model) is that the resulting
- system could have wild instabilities. (Maybe my understanding of
- NTP is incorrect here.) The dynamic nature of the DTS
- autoconfiguration rules (couriers choosing a random member of the
- global set for instance) means the the input time driving the local
- clock model will have what Dave calls "phase noise". As I
- understand NTP's local clock model this is where the instability
- creeps in.
-
- It's not so much the phase noise as it is the dynamics of the local
- clock loop itself, sort of like requiring tickadj to have a confined
- range of adjustment. The rate at which the loop corrects for time and
- frequency errors is fundamental to its stability; otherwise, it could
- surge in much the same fashion as if you tried to drive a car with a
- half-second delay between the steering wheel and the steered wheels. I
- believe, as I hope is demonstrated in NTP implementations, that
- appropriate parameters can be specified and engineered for any
- implementation, either DTS or NTP in much the same way that tickadj is
- engineered now, even on an optional (configured) basis. I envision a
- local-clock implementation appropriate for either NTP or DTS or TSP for
- that matter by selecting either model with engineered parameters
- determined only on the basis of whether you have a line-frequency
- oscillator, an uncompensated quartz oscillator or a GPS receiver or
- atomic clock. This is in fact the fuzzball implementation.
-
- ... Further, the existing NTP protocol avoids loops by using a
- stratum concept, again the DTS autoconfiguration happily produces
- loops. As I noted previously this doesn't effect the DTS algorithm,
- but they would cause havoc for NTP. Again one could add complexity
- to the DTS algorithm to prevent the loops, but I claim one would
- pay a price in system management cost.
-
- I perceive the DTS model does not consider more than three "strata"
- (global server, local server, clerk) necessary in a DTS subnet, right?
- If this can be assured, NTP is in fact needlessly complex. However, we
- are not building NTP subnets this way and have found it necessary to use
- a richer hierarchy requiring more strata, even if some LANs have their
- own time providers. One reason for this is a notorious distrust of time
- providers, so all NTP primary servers chime (usually at least three)
- other servers, not even necessarily primary servers. Also, even in this
- university, which is hardly at a loss for time providers (!) we have
- many stratum 4 and probably stratum 5 chimers even now.
-
- ... Another problem (according to Dave) is that the resultant phase
- locked loops have to be analyzed in the light of assumed
- probability distributions, etc. and one does not end up with the
- sort of proofs of correctness that are what is liked in DTS. There
- is one interesting aside on this last point. I believe there is a
- way one could add clock training to the DTS model and preserve the
- correctness. If the training algorithm decides to change the rate
- of the clock by some amount, *increase* the maximum drift rate by
- that amount. I believe this can be shown provably correct by the
- techniques in Marzullo's thesis. However, while this improves the
- precision of the time (the intersystem phase differences and rate
- differences will be smaller) the inaccuracy (the guarantee given to
- the user) will be worse! That DTS has chosen not to do this is, of
- course, the basic philosophical difference about what's important
- showing up again. However, the existance of at least one method to
- incorporate clock training into a provable system gives hope that
- both camps can be satified and in particular that the large body of
- work on the NTP local clock model can be incorporated. I am not
- (yet) expert enough in the NTP local clock model to see my way
- through this.
-
- While it may be that the inaccuracy interval provided to users may
- degrade slightly if frequency compensation is embraced, the inaccuracy
- jiggles certainly can't be as bad as the rock-n'-roll I see with the
- simulator and Norway data. I sense there is room to jostle on this
- issue.
-
- ... The other obvious possibility is to just add the
- autoconfiguration to NTP. To an extent, this is occuring. The
- multicast functionallity clearly addreses the ease-of-management
- issue. However, for NTP servers, I claim that chosing the right
- server is important enough that it can't be left to an algorithm.
- Swithing at random between servers reintroduces the clock-hopping
- problem (the the extra phase noise produced by the clock-hopping
- will cause problems for NTP.) One could attempt to just pick a set
- of server at random and stick with them for some long time to
- reduce clock hopping, but that will produce serious sub-optimality
- in the case of a changing network configuration (The particular
- servers being synchronized with might become cut-off from their
- good time sources, or the paths to them might involve links which
- become overloaded, and this wouldn't be discovered for a long
- time.)
-
- The reason for the NTP selection algorithm is to find the "best" clocks
- from a population possibly including "poor" ones on the basis of
- estimated accuracy, stability and so forth. The selection and weighting
- factors are dynamically determined using what I hope are sound
- statistical principles and are considered so important that only a
- purpose-designed algorithm could do it, much less an autoconfiguration
- scheme. I think what you have in mind are discovery and configuration
- issues, which certainly could be improved were DTS algorithms to be
- mimiced.
-
- Do you hear a faint echo of the old Xerox Clearinghouse in the far
- distance?
-
- ------------------------------------------------------------------------
-
- Date: Fri, 30 Mar 90 08:30:48 PST
- From: comuzzi@took.enet.dec.com
- To: mail11: ;, "@dts-ntp.dis" <UNKNOWN@decpa.pa.dec.com>
- Subject: More discussion of NTP and DTS
-
- Dennis,
-
- Sorry for the delay in responding, I wanted to review where we are and
- try and summarize our positions. It happens that I'm not an expert in
- control theory. Mike Soha, who is a student of that subject has already
- responded to your discussion. I look forward to a continued lively
- exchange.
-
- Dave,
-
- You've asked this question twice (whether DTS will appear as an Internet
- RFC) and it deserves an answer. It turns out that about three or four
- months ago Ross Callon tried to elicit interest in DTS in the Internet
- Engineering Steering Group (of which he is a member). Ross wanted to
- submit an RFC, but The response was not encouraging, basically he was
- told nobody wanted to think about time services. I agree with your
- observation that it would have been better to have these conversations
- earlier. I'm assuming the ISEG's lack of interest will change if OSF
- selects DTS as one of its Distributed Computing Environment
- technologies. If that happens however, the architecture specification
- will probably be tweaked to reflect other OSF technology selections,
- such as which nameservice, RPC, etc., but we will submit an RFC.
-
- Even if DTS is not selected, I believe it would still be a good idea for
- DTS to appear as an RFC (I suspect the relevent powers would entertain a
- DTS RFC if you supported it).
-
- Now continuing the discussion, Dave's observation that DTS is not dead-
- set against clock training is in fact correct. Clock training can be
- viewed as orthogonal to the DTS architecture, though of course any
- training would have to be done in a way which preserved the Marzullo
- proofs. Mike describes in his note a simple proposal he has for
- incorporating clock training. The question I have personally been
- struggling with, and haven't had much success understanding is: In the
- furture, if DTS incorporated the NTP local clock model how much of the
- rest of NTP would have to come along with it? In particular, would the
- various fields in the time response message (e.g., synchronizing
- dispersion) be required? (It seems that these have more to do with the
- clock selection algorithm) Are strata required, or is the loop breaking
- mechanism of inaccuracy (synchronization distance) sufficient? Can the
- local clock model be proven to be stable for all input? Earlier, I got
- the impression that this could only be done by making rather strong
- assumptions about the network distribution functions, etc. Now I'm not
- at all sure of that, based on recent statements from Dave and Dennis. I
- hope this will fall out of the continuing 'control theory' discussion.
-
- Dave correctly observes that the DTS architecture only has a three level
- hierarchy. The DTS architecture has been extended (as part of the OSF
- submission) to permit the specification of a local set which is not
- autoconfigured. Basically, the local set is enumerated in a nameservice
- directory just like the global set. This permits construction of a
- hierarchy in which one level's global set is another level's local set.
- Obviously this is only autoconfiguring at the leaves, but it does permit
- construction of as complex a hierarchy as one would desire. We are *NOT*
- going to tell customers to do this, and the vast majority of customers
- will be quite content never thinking about strata or multi-level
- hierarchies.
-
- The real difference of opinion here is whether the majority of extended
- LANs will have their own time providers. DTS assumes that TPs are
- becoming commonplace. In that case, one only does WAN transactions as a
- check for inter-XLAN time differences, and to support small (TPless)
- LANs. For this environment, the short hierarchy suffices.
-
- Thinking about the "autoconfiguration vs. accuracy" issue in light of
- the the different TP availablity assumptions, I believe I understand the
- disagreement. (I think Dennis had figured this out too, I just hadn't
- read what he was saying!) DTS derives much of its ease-of-management
- because it only thinks in terms of a three level hierarchy, it
- autoconfigures the leaves, and it uses a single global set stored in the
- namespace. If NTP was used in a similar manner, then NTP could be made
- equally autoconfiguring. I agree. The point I was trying to push is that
- when you create a new server for NTP, you have to figure out where to
- put it in the grand hierarchy and further you have to select peers for
- it that will provide reasonable quality time. Now if you don't have an
- NTP hierarchy, then my argument is specious -- the same strategy that
- works for DTS would work for NTP.
-
- To summarize (I believe) Dennis' position: NTP can be made just as
- autoconfiguring as DTS. Indeed it already has some of the
- autoconfiguration due to its multicast mode; it just needs the discovery
- mechanism. The result would be more accurate than DTS, due to clock
- training. My position would reduce to:
-
- DTS is a simpler protocol which already has the autoconfiguration and
- could be made more accurate by adding some clock training. These
- positions are not that far apart. There is still a remaining
- disagreement: how much clock training or other complexity to add and
- what the optimal amount of complexity is from a cost-benefit analysis.
- Now, as I understand Dave on the autocofiguration issue, his position
- is: TPs are not (and will not become in the near future) that
- commonplace, so more than a three level hierarchy is required. Hence
- only the leaves can be made autoconfiguring and there will always be
- some residual manual configuration. (I'm looking for an agreement on
- what our various position are here, I am willing to accept that we
- disagree for now.)
-
- Considering the larger question of "manageability vs accuracy" there is
- still a lot of complexity in NTP which manifests itself as more
- management complexity. I observe there are at least three parameters
- (associated with the filter and selection algorithms) whose description
- includes the sentence "While the value of ... is suggested, the value
- may be changed to suit local conditions on particular peer paths" (or
- similar text). This seems to me a rather frank admission of more
- management (as opposed to more work for implementors). We seem to have
- no agreement (due to our different goals of how much accuracy to deliver
- in the first place) on these trade-offs.
-
- I believe there two issues where we have reached complete agreement:
- Both Dave and Dennis seem to be in agreement that NTP can be modified to
- include a provable inaccuracy bound, equivalent to DTS's inaccuracy.
- (I'll stick to the DTS jargon because I'm more comfortable with it.
- Dave's term confidence interval is a reasonable name.) This interval
- could be provided to the end user. I believe this also. I hope that NTP
- will continue to evolve in this direction, independent of the OSF
- decision. Second, I believe we are in agreement on how (adequate)
- interoperation between the protocols could be achieved. I'll sign-up to
- discuss this in the DTS RFC.
-
- I am curious about Dave's experiments with the DTS algorithm. Dave, can
- you provide more details (possibly out-of-band to this discussion)?
-
- This note is already longer than I intended, so I'll send it out. I'll
- keep the discussion of layering a time service over an authenticator
- (that Bill raised) for later.
-
- ------------------------------------------------------------------------
-
- Date: Mon, 2 Apr 90 10:00:22 PDT
- From: Michael Soha LKG1-2/A19 226-7658 <soha@nerva.enet.dec.com>
- To: dennis@gw.ccie.utoronto.ca
- Cc: decpa::"Mills@udel.edu", decpa::"elb@mwunix.mitre.org",
- decpa::"marcus@osf.org", soha
- Subject: My response to Dennis Ferguson's NTP vs DTSS memo - retry
-
- Dennis,
- When I first read this memo, I discounted it as part of an emotional
- discussion of why NTP is the right choice. However upon reflection I
- felt it necessary to respond to the many issues alluded to in the memo.
- The problem I have with this memo is twofold: there are several
- statements that are simply incorrect, and some of the deductive
- reasoning is questionable. I will apologize beforehand for my bluntness
- in this memo; however, I find it necessary due to the amount of
- misconceptions in areas such as: UTC, quartz oscillator theory, and time
- transfer techniques.
-
- As a general comment I find this discussion, NTP vs DTSS, becoming a
- very emotional debate. In an effort to move this discussion to a more
- technical plane I suggest that people specifically note their
- references; I have attached mine at the end of this response.
-
- TRAINING
-
- I'd like to begin with a discussion of one of my major concerns with
- NTP: Training of clocks (and the local clock model). Looking at Dave's
- NTP specification, Ref 1, page 36 I read "The Fuzzball logical-clock
- model, which is shown in Figure 3, can be represented as an adaptive-
- parameter, first-order phase lock loop, which continuously adjusts the
- clock phase and frequency to compensate for its intrinsic jitter,
- wander, and drift.? And the last two sentences of this section (ppg 37-
- 38) read: "The effect is to provide a broad capture range exceeding four
- seconds per day, yet the capability to resolve oscillator drift well
- below a msec per day. These characterists are appropriate for typical
- crystal controlled oscillators with or without temperature compensation
- or oven control."
-
- Given these statements I conclude: 1. that the clock model attempts to
- remove both short term (environmental) and long term drift components
- (aging and initial offset); and 2. for an uncompensated crystal
- oscillator one can attained a stability on the order of one part in
- 10**8 (ie, 1 msec/day). I disagree with both conclusions and that is why
- I cannot accept the NTP clock model.
-
- The influences on oscillator frequency include [REF 2-6] include:
-
- TIME - short term (noise), long term aging
- TEMPERATURE - static freq vs temp, thermal history (hysteresis)
- ACCELERATION - vibration, shock, acoustic noise
- IONIZING RADIATION - steady state, pulse
- OTHER - initial offset, power supply voltage, humidity, load impedance
-
- The above list has a number of contributors that are of more interest to
- DOD than the computer field. Reviewing a crystal oscillator catalog [Ref
- 6] one sees the main contributors being: time, temperature, initial
- offset and power supply voltage. Now the question becomes what are the
- relative magnitudes of these drift contributors. Let's look at a
- uncompensated (ie, without temperature compensation) 5 Mhz CMOS Clock
- Oscillator, VECTRON part # CO-416B option 5. The numbers are:
-
- Accuracy at 25 degrees C: +/- 10ppm
- 0 to 50 degrees C: +/- 5ppm
- Aging 3 ppm/year first year
- 2 ppm/year thereafter
- Supply Voltage little impact since it's
- on the order of 10**-7/% change
- Now assuming one has no control over the environment (ie, it may be
- sitting in my office or tuck away in a closet) then the noise floor is
- on the order of +/- 5ppm (temp stability). This may seem a little
- extreme, but you have to realize that a protocol designer has little
- control over the internal temperature gradients of a computer system.
-
- Given the background noise of +/- 5ppm, one cannot measure the error
- associated with aging 3 ppm/year (approx 10**-9/day). The best one can
- expect to do is to get close to +/- 5ppm. The bottom line is that one
- cannot train an oscillator in an uncompensated environment because of
- the noise (ie, instability due to the environment). Given this, how can
- the NTP clock model achieve a stability of one part in 10**8 (ie, less
- than a msec/day) for an uncompensated oscillator? My conversations with
- other people in this field indicate that it is simply not possible.
-
- Now why did we pick a number like 10**4. Well assuming a lifetime of 10
- years, the stability of this oscillator would be (at the end of 10
- years) about 36 ppm or 3.6 part in 10**5 (5 ppm for temp, 21 ppm for
- aging, 10 ppm accuracy). We felt that one part in 10**4 would be true
- for most of the VAX crystal oscillators.
-
- It is my belief, that one may be able to account for the error
- associated with the intial offset (ie, actual milling and polishing of
- the crystal). In the case of DTSS, one may be able to improve the clock
- stability from one part in 10**4 to about 1 part in 10**5. I would
- simply measure the error of the clock over a day (approx 10**5 seconds).
- Assuming I knew UTC to within 100 msec, I'd have enough significant data
- to calculate the oscillator drift to about one part in 10**5 (10**5
- seconds/100 msec = 10**6). To do this one would need a S/W clock that is
- not adjusted; it must be able accumulate the error over a day. Once the
- new tick value is determined, one could update the timer interrupt
- routine. Note that this correction is orthognal to DTSS operation and
- need not be done that frequently? the 10**5 value would be correct for
- at least a year since the stability after one year would be +/- 8 ppm (5
- pmm for temp and 3 ppm for aging).
-
- ------------------------------------------------------------------------
-
- Date Tue, 3 Apr 90 5:12:09 GMT
- From Mills@udel.edu
- To Michael Soha LKG1-2/A19 226-7658 <soha@nerva.enet.dec.com>
- cc dennis@gw.ccie.utoronto.ca, comuzzi@took.dec.com, mills@udel.edu
- Subject? Re? My response to Dennis Ferguson's NTP vs DTSS memo - retry
-
- This message is in response to your reply to Dennis Ferguson's recent
- message about the NTP local-clock model and its implications. I would
- like to thank you and others at DEC for the time and care you have put
- into the recent message exchanges. I think we have all learned useful
- things that might be applied to ongoing and future projects. However, I
- want to make clear that my interest in pursuing this discussion is not
- to establish which of DTS or NTP is "better," but what can be learned to
- improve them or a future enhanced protocol. I realize the importance to
- DEC's agenda of capturing the standards process and have no personal
- interest in competing with this or obstructing it. Based on experience,
- however, I do want very much to promote that, whatever standard is
- adopted inside or outside the Internet community, the performance
- objectives attributed to NTP are at least potentially attainable, either
- in the emerging protocol stack or enhancements of it.
- I suspect Dennis might want to produce his own reply; however, I will
- respond to the technical points you raise. I am not including the text
- of either yours or Dennis' original message, since that might increase
- the bulk to unbearable levels.
-
- What Dennis has called "clock training" and I have called "frequency
- compensation" was introduced several years ago in the local-clock model
- adopted in NTP. The primary reason for doing this is to eliminate the
- need to precalibrate the inherent frequency offset of the reference
- oscillator and to serve as a digital filter to reduce the timing errors.
- In fact, the model was introduced prior to NTP and has evolved over
- several generations of time-transfer systems since 1979. As you know, it
- is described as an adaptive-parameter, first-order, type-II phase-locked
- loop (PLL), which is analyzed in many books, including those cited by
- each of us. While a type-I PLL is unconditionally stable, this type of
- loop cannot remove all timing errors, since it cannot compensate for
- frequency errors. A type-II PLL can do this, but this type of loop can
- become unstable, unless it is engineered according to established
- principles. The NTP PLL has been rigorously analyzed, designed,
- simulated and implemented according to these principles. The cost for
- this is additional architectural constants and tighter tolerances to
- maintain overall stability.
-
- The constants called out in the NTP spec were arrived at after
- substantial analysis, simulation and experiment using Internet paths
- ranging from high-speed LANs to those spanning the globe. A detailed
- mathematical analysis can be found in Appendix F of the February 1990
- revision of the NTP spec, which has not appeared yet as an RFC, but can
- be FTPed from louie.udel.edu as the PostScript file pub/ntp/ntp.ps or I
- can mail you a paper copy if you wish. By the way, the local-clock
- algorithm described in the existing spec, Section 5, has minor errors in
- a couple of places, including some of the recurrence equations. This
- section was completely rewritten in the revised spec and several new
- appendices added. I am currently working on another appendix on error
- analysis.
-
- An important principle in the design of the local-clock algorithms was
- that the protocol itself should not limit the possible application to
- precision time and frequency transfer and that it be scalable to very
- high speeds, hopefully beyond a gigabit. Surely, there are no hosts
- today that can achieve anything remotely close to 232 picoseconds, but
- there are a number of time-transfer applications using special equipment
- where NTP might be useful, including our own gigabit network research
- program. The same principles arise when synchronizing mundane computers
- on the Internet. Not all hosts can or even need to achieve millisecond
- time transfer and sub-ppm stability; however, I did not want the spec or
- algorithms to be the limiting factors.
-
- I have explored in depth the design and capabilities of the local-clock
- reference oscillators found in typical computing equipment and concluded
- the time and frequency transfer claims made in NTP are justified.
- Further discussion on this point can be found in my paper in the January
- 1990 issue of ACM Computer Communication Review and in my paper to
- appear in IEEE Trans. Communications. PostScript versions of these
- papers can be FTPed from louie.udel.edu as the PostScript files
- pub/ntp/ccr.ps and pub/ntp/trans.ps or I can mail you paper copies if
- you wish.
-
- You call out specifications of a typical uncompensated quartz oscillator
- as:
- Accuracy at 25 degrees C: +/- 10ppm
- 0 to 50 degrees C: +/- 5ppm
- Aging 3 ppm/year first year
- 2 ppm/year thereafter
- Supply Voltage little impact since it's
- on the order of 10**-7/% change
-
- These specifications are in fact much better than those I find in
- typical computing equipment, where frequency inaccuracies up to 100 ppm,
- temperature sensitivities up to 1 ppm per deg C and aging rates up to
- 0.1 ppm per day (36 ppm per year) have been measured. By contrast, the
- $700 Isotemp 5-MHz OCX0s used here have a specified stability of +-
- 5x10^-9 from 5 to 55 deg C and aging rate of 1x10^-9 per day after 30
- days. We keep them honest with a cesium oscillator calibrated by USNO.
-
- In spite of widespread mediocrity and while only a few NTP servers are
- equipped with precision oscillators (two have cesium oscillators (three
- have OCXOs and one a TCXO), the vast majority of NTP-controlled
- oscillators can hold frequency surprisingly well. In these oscillators
- the dominant error term is neither noise nor short-term stability, but
- temperature sensitivity. Under typical indoor conditions both in and out
- of machine rooms, I have often opened the PLL loop at a primary server
- and found it within a few milliseconds of reference time after coasting
- for some days without outside correction.
-
- Of course, not all oscillators conform to these anecdotal observations;
- however, a goal in the NTP design was to provide the highest possible
- performance with whatever oscillator is available. In fact, one reason
- for the adaptive-parameter design was to automatically optimize the loop
- bandwidth for the particular oscillator stability characteristics, with
- the baseline assumed on the basis of expected diurnal variations of a
- few ppm over the 24-hour period. You quite the spec:
-
- The effect is to provide a broad capture range exceeding four
- seconds per day, yet the capability to resolve oscillator drift
- well below a millisecond per day. These characteristics are
- appropriate for typical crystal controlled oscillators with or
- without temperature compensation or oven control.
-
- The intent is to state that the characteristics are appropriate for
- oscillators with and without temperature compensation (indeed, the loop
- adapts to each type) and (with the appropriate oscillator) stabilities
- of a millisecond per day are achievable. I hope the quote was not
- misleading.
-
- For clarification on a few other points you raise, note that the NTP PLL
- does not attempt to compensate for quartz aging, which results in a
- gradual change in frequency over time. This of course requires a type-
- III PLL, which is in fact used in some disciplined secondary frequency
- standards built for digital telephone network synchronization; however,
- I did not feel in this case that the additional complexity required
- would be justified. I did in fact experiment with a second-order type-II
- PLL in order to further minimize the phase noise, but this raised
- problems due to the tight constraints on update intervals. The type-II
- loop is stable throughout the range that results in two-way
- reachability.
-
- Your note suggests an alternative to the perceived complexity of the NTP
- PLL is a manual observation of the frequency error measured over a day,
- which could presumably be done at installation and saved in a file for
- recall at system reboot. This is exactly what Dennis has done. It might
- be just as easy to equip the oscillator module with a trimmer capacitor
- and trim out the error when the module is built. However, the intent in
- NTP was to do this automatically, with the startup value used only if
- available and in order to reduce the initial convergence time.
-
- Following are specific responses to some of your technical comments.
- Dennis may have some more of his own. Your quote from "Modern Control
- Engineering" by Ogata, p. 7:
-
- From the point of view of stability, the open loop control system
- is easier to build since stability is not a major problem. On the
- other hand, stability is always a major problem in the closed loop
- system since it may tend to overcorrect errors which cause
- oscillations of constant or changing amplitude".
-
- You go on to say a type-II system is more apt to be unstable than a
- type-I. Since both DTS and NTP derive timestamps relative to the local
- clock, both are certainly closed-loop systems. As mentioned previously,
- type-I systems (DTS) are unconditionally stable; however, type-II
- systems can be stabilized through good engineering design such as
- alleged in NTP. I can't answer the question of whether this is the best
- design appropriate for all conditions; however, the design has been
- validated over the sometimes ludicrously large envelope of conditions
- found in Internet LANs and WANs over the past decade.
-
- Your comment on "UTC accuracy to the nanosecond" requires frequent trips
- to USNO, of course. The proper statement should be "time transfer to the
- subnanosecond, UTC transfer to the limits of the available timekeeping
- components and time provider." GPS can achieve precisions of a few
- nanoseconds only if the ephemeris dither is turned off, which after the
- recent announcement is not likely. Kinemetrics claims their GPS time
- provider (with GPS receiver actually manufactured by Rockwell Collins)
- is accurate to 100 ns relative to USNO and 250 ns relative to UTC. As to
- the CCIR expectation of UTC dissemination to the microsecond, the whims
- of the US Congress were not respected. Current legislation requires
- LORAN dissemination to 500 ns and the Coast Guard expects to improve
- that to the order of 50 ns. Judging from measurements made by my grad
- student and published USNO corrections, there does not seem any chance
- to achieve that. On the other hand, NTP was also intended for local time
- transfer, for which the 232-ps resolution would seem to be justifiable.
-
- Your comments on NTP's lack of formal proofs are well taken. While these
- goals may have been neglected with respect to goals of performance, I
- think we all agree that minor changes can easily be made to NTP with the
- effect that claims similar to those made for DTS can be made for NTP. In
- fact, as experiment I crafted Marzullo's algorithm into the NTP
- simulator I use for evaluation and am testing it as part of the
- algorithmic components. The result is no decrease in accuracy and,
- presumably, a correctness capability. Note that the NTP "inaccuracy" is
- calculated in the same way as DTS, but includes only a maximum bound on
- the frequency error per day.
-
- You make an important point about estimating the state of the network at
- one time based on observations about its state at another. I worry about
- this, too, and have accumulated a rather large collection of measurement
- data between NTP servers in the Internet. Some conclusions on this issue
- can be found in the papers cited above. In particular, I have used NTP
- on many occasions as a management tool to detect changes in network
- routing and as an alert for congestion conditions. The fact that it runs
- continuously and produces accuracies to the order of a few milliseconds
- on most primary and secondary servers relative to extant path delays has
- proved a highly useful diagnostic tool.
-
- Your comment that one-way delay asymmetries can lead to estimation
- errors applies to both DTS and NTP, of course. However, most NTP servers
- run the protocol with at least three peers via diverse paths and some of
- them use the algorithms described in the 1978 NBS monograph you
- reference to reduce the errors. The February 1990 spec revision
- describes how this is done and presents the statistical justification
- for it. In practice, asymmetries as large as the 100 ms you report are
- quite rare on the Internet, although those of 10-20 ms are common and
- some (US-European) paths are as high as 70 ms. Mixed satellite-
- terrestrial paths have in the past haunted us, but the only ghost left
- now seems the USAN network.
-
- While I realize our recent message exchanges have required substantial
- time investments for each of us, I would like to again emphasize the
- value of an ongoing dialog within the research and engineering
- communities. I have strived to maintain an objective and productive tone
- in these exchanges and would like to encourage you to share ideas,
- experiences and even flames with us and to participate in experiment
- opportunities as they develop.
-
- ------------------------------------------------------------------------
-
- Date: Tue, 3 Apr 90 19:42:38 EST
- From: Dennis Ferguson <dennis@gw.ccie.utoronto.ca>
- To: soha@nerva.enet.dec.com
- Cc: comuzzi@took.dec.com, mills@udel.edu
- Subject: Re? My response to Dennis Ferguson's NTP vs DTSS memo - retry
-
- I must apologize for the tone of the last message. Chalk it up to a
- little bit of frustration concerning the course things seemed to be
- taking. Let me make it very clear that my interests are in good quality
- network timekeeping. I have some understanding of the issues, and I like
- implementing software which does this stuff well. I have no emotional
- attachment to either NTP's, or anyone else's, packet or timestamp
- formats, nor do I stand to gain much benefit from playing with this
- stuff. I will likely do a DTS implementation no matter what form it ends
- up being standardized in, I'm just not going to like it much if it isn't
- a good protocol.
-
- What I do like is good quality timekeeping. I suspect NTP's encounter
- with DTS will make it a better, more usable protocol, At this point you
- can bet that NTP version 3 will include a correctly determined
- uncertainty interval, and very likely won't go out without an
- autoconfiguration protocol and procedures for authentication key
- management as well as an SNMP MIB. None of these conflict with the
- machinery that NTP already includes, which is very good at keeping your
- clock accurate.
-
- What bothers me is that I would not like to see an international
- standard timekeeping protocol which is substantially less accurate than
- NTP, just because at this stage there is just no reason for it. The idea
- of not providing frequency compensation for your system clock is hence a
- wee bit shocking, and I also see no reason for you to ignore the NTP
- clock selection/combination procedures as long the offset it produces
- lies within the uncertainty interval. Incorporating the authentication
- procedures within the protocol, while unclean, also has its advantages.
-
- In any event, I won't go on for long here since Dave has covered most of
- the issues and I'd rather spend the time implementing whatever he
- produces in the way of "correct NTP", if only to prove that one can be
- both correct and accurate with one's timekeeping.
-
- Just a couple of additions to Dave's comments on frequency compensation
- (I didn't call it clock training, either) of the local clock. The code
- implements a PLL, this is certainly covered in Time and Frequency
- Fundamentals and is common place in the time keeping industry. Indeed,
- if you pry the cover off of a good quality IRIG-? time code receiver (I
- have technical documentation for several made by Trak Systems), or even
- a good quality WWVB or GOES receiver, you will very likely find a
- microprocessor inside which implements pretty much the same procedure to
- synchronize the local oscillator (this is often called a "disciplined
- oscillator" in the advertising brochures).
-
- Further, NTP's local clock really is separate and distinct from the
- network time exchange protocol (whether NTP or DTS). It is actually part
- of the kernel in fuzzballs, and Dave has been encouraging Louie Mamakos
- to insert it into the 4.4 BSD kernel behind the adjtime() interface
- (something which I have some reservations about, but certainly not on
- the basis of it being NTP-specific. It just isn't). NTP's local clock is
- not a timekeeping protocol, it sits behind the timekeeping protocol and
- receives offset estimates from the latter. Your comment about "DTS and
- NTP are both timekeeping protocols" was way off the mark.
-
- You are right that there are tradeoffs between the type II control loop
- that NTP uses and DTS' type I control. Note that an error in frequency
- in essence presents a ramp input, whose slope is the frequency error (or
- drift), to your control loop. The slope of the input sometimes changes
- (usually with temperature, I have a plot pegged to the wall beside my
- desk of the temperature calibration of the crystal in my workstation,
- measured with NTP. The slope is about -1.1 ppm/C), though changes to the
- frequency error are normally quite small compared to the longer-term
- average.
-
- The NTP type II loop does several things right. First, it tracks the
- ramp input (i.e. the frequency error) with zero steady state error. It
- will also track changes in the frequency error (i.e. "drift" variations)
- fairly accurately if they occur slowly enough. DTS can't do this, it
- exhibits a steady state error when tracking a ramp proportional to the
- slope. Changes in the slope change the steady state error.
-
- Second, because the time constant of the loop is fairly long, the NTP
- type II loop tends to damp out statistical noise in the data you are
- trying to phase lock to (i.e. the offsets produced by the time exchange
- protocol). DTS' type I loop, however, allows this jitter right through
- to the system clock.
-
- Third, while unrelated to type I versus type II, the NTP local clock
- applies an adjustment to the system clock by making little tiny
- adjustments once every 4 seconds, rather than all at once. This
- spreading out of the adjustment avoids the high frequency jitter that is
- otherwise caused. Note that for the archtypical, 100 ppm frequency
- error, DTS synchronized clock, we've agreed that the system clock will
- be zooming around by 90 ms over the 15 minute update interval, with an
- average (steady state) error of 45 ms. The NTP local clock, when faced
- with a 100 ppm error for which it doesn't know the correction, (this can
- happen when you run the daemon on a machine for the first time, it is
- analogous to a large step change in the slope of the signal you are
- tracking) will exhibit a transient error which is initially about 40 ms
- in magnitude as well, but this will be stable without the big jitter.
-
- What you lose for this is speed in correcting step changes in the clock
- condition. DTS can correct the system clock offset which can occur at
- startup in the first update. Similarly, DTS will almost immediately
- assume the steady state operating condition as soon as it is started on
- a machine. Offsets due to lost clock interrupts are corrected within an
- update or two as well. The NTP local clock takes longer to correct these
- types of events, and/or requires a higher update rate while doing so.
-
- Now the tradeoff. Local oscillators whose frequency is wrong, and whose
- frequency varies, are a fact of life. This is very nearly always the
- case. Likewise, statistical jitter in the offsets produced by the
- timekeeping protocol (whether NTP or DTS) is equally unavoidable, so
- damping of these fluctuations is highly desireable.
-
- Thus the NTP local clock deals with the common case (a system clock
- whose frequency is inaccurate and which varies somewhat, and somewhat
- noisy data from the network) much more accurately than DTS' type I
- control. On the other hand, it corrects startup transients and responds
- to step changes due to things like lost clock interrupts, more slowly
- than DTS does (it deals with these things, but at a more stately pace).
- Note that the latter are exceptions, however, since you start the
- protocol infrequently and you hope you don't lose clock interrupts at
- all. Thus the NTP local clock optimizes performance in the ordinary case
- at some expense to speed when handling exceptional conditions. I think
- this is a good engineering tradeoff.
-
- I think frequency compensation of the system clock is a must for DTS,
- and I'm not going to be happy if it progresses towards standardhood
- without it. I'm also not convinced that you shouldn't be looking at the
- way NTP filters and selects samples, since this does measureably improve
- your time but certainly doesn't preclude the calculation of a correct
- uncertainty interval.
-
- I think I may have said enough, since I obviously have yet to convince
- anyone. The more I consider it, though, the more I think the
- correctness-and-management versus performance tradeoff is just so much
- baloney. These issues are all separable, I see no reason why you can't
- have everything in one protocol. I think rather than trying to argue
- this position, however, it might be better to spend the time on
- producing a correct, autoconfiguring, accurate NTP that I can give to
- people so that no one at the standards committee will believe you if you
- try to foist this argument off on them.
-
- By the way, it occurs to me that an NTP which computes a correct
- uncertainty interval will allow us to make head-to-head performance
- comparisons between DTS and NTP. If both protocols compute provably
- correct inaccuracies, but one consistantly produces a smaller inaccuracy
- on the same machines via the same network paths, is not the latter a
- better, more useful protocol? I think so, and I have a distinct feeling
- both NTP's local clock and clock filters are going to make this quite
- interesting. We may be able to convince you DTS needs some of this stuff
- after all.
- ------------------------------------------------------------------------
-